fix: preserve inlined MTP layers for GLM5#1532
Conversation
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds streaming safetensors support and helpers to detect and load MTP tensors stored either inline in model shards or in separate shard files, splits loaded tensors into model state-dict entries and orphaned tensors, and includes unit tests and a test utility module. ChangesMTP Weight Loading Enhancement
🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/example_utils.py`:
- Line 415: The line using json.load(open(index_file)) leaves the file handle
open; change it to use a context manager so the file is closed automatically
(e.g., open index_file with a with statement and pass the file object to
json.load), mirroring the fix used in _load_inlined_mtp_tensors; update the code
that assigns to the variable index to read via the with-block and remove the raw
open(...) call.
- Line 333: Replace the direct open(index_file) call with a context manager to
ensure the file handle is closed: use a with open(index_file, "r") as f: and
call json.load(f) to populate weight_map (the expression that currently assigns
to weight_map should read from the file handle `f`), updating the code around
the weight_map assignment in example_utils.py accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8d53a7fd-806c-49a8-bdf1-476fbd1570e1
📒 Files selected for processing (1)
examples/llm_ptq/example_utils.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1532 +/- ##
==========================================
+ Coverage 76.63% 77.03% +0.39%
==========================================
Files 476 476
Lines 51813 52208 +395
==========================================
+ Hits 39707 40217 +510
+ Misses 12106 11991 -115
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
/claude review |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 382-384: The early return when index_file (model_dir /
"model.safetensors.index.json") does not exist drops standalone MTP weights;
instead, change the logic in the block that computes index_file so that when the
index is missing you fall back to the legacy layout: scan model_dir for
standalone safetensors files (e.g., "mtp.safetensors" and "model.safetensors")
and build the returned (set, dict) from those files rather than returning
(set(), {}); update the code that references index_file to only parse the index
if it exists, otherwise construct the weight map from discovered safetensors so
MTP weights are preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 89d68edb-f100-4032-9d77-cb23031ac458
📒 Files selected for processing (3)
examples/llm_ptq/example_utils.pytests/_test_utils/examples/llm_ptq_example_utils.pytests/examples/llm_ptq/test_example_utils.py
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
Summary: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 2
The fix correctly extends load_mtp_weights to detect inlined-MTP layouts (DeepSeek-V3, GLM-5.1 GlmMoeDsa, GLM-4.7) by computing the layer-index window from config.num_nextn_predict_layers + config.num_hidden_layers and streaming matching tensors from on-disk shards via safe_open. Logic verified end-to-end:
get_inlined_mtp_prefixesreturnsmodel.layers.{N..N+K-1}prefixes; theprefix + "."startswith check correctly avoids false matches against neighboring layer indices (e.g.model.layers.78.won't matchmodel.layers.781.x).- The split into in-state-dict vs orphan tensors handles both DeepSeek-V3 (HF instantiates the extra layers) and GLM-5.1 (orphaned, routed via
extra_state_dict). - The legacy
mtp-substring path is preserved unchanged in_scan_separate_file_mtp. - Downstream call sites (
hf_ptq.py:1137quant exclusion andunified_export_hf.py:802exclude_modules) work transparently with the newmodel.layers.N-style prefixes. - Return signature is unchanged; ordering is now deterministic (sorted) — minor improvement.
- Tests cover the pure prefix-derivation contract.
Two non-blocking SUGGESTIONs left as inline comments (a docstring reference to a non-existent MTP_DETECTION.md, and minor defensive-asymmetry in get_inlined_mtp_prefixes). CodeRabbit already flagged json.load(open(...)) file-handle leaks at lines 337 and 386 — not duplicating.
Risk: low. Scoped to an example utility, with a fallback path that preserves prior behavior for non-MTP configs (num_nextn_predict_layers absent or 0).
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Bug fix is reasonable and well-structured: the existing "mtp" in key substring detection genuinely misses GLM-5.1's inlined model.layers.{i} convention, and splitting into get_inlined_mtp_prefixes + _load_inlined_mtp_tensors + _scan_separate_file_mtp + _apply_to_model_state_dict is cleaner than the prior monolithic function. The single-file model.safetensors fallback in _load_inlined_mtp_tensors is also a nice secondary improvement (legacy path only handled the sharded case).
Flagging for human sign-off because:
- Test coverage is thin. The only new test exercises the trivial 4-line
get_inlined_mtp_prefixes(config → list of strings). The actual on-disk loader (_load_inlined_mtp_tensors), the orphan-vs-state-dict split (_apply_to_model_state_dict), and the integratedload_mtp_weightsflow are untested. The PR body says the author verified on "a mini GLM-5.1 fixture (4 hidden layers + 1 inlined MTP atmodel.layers.4, 7 synthesized MTP tensors)" — that fixture would make a solid unit test for the loader path; it's odd not to include it. The bug being fixed (silent tensor drop) is exactly the kind of regression a fixture-based test would catch. - End-to-end verification is still pending. Author explicitly says "To be verified with full model" — i.e. no full GLM-5.1 export run yet.
- Stale docstring reference.
load_mtp_weightsnow points readers toexamples/llm_ptq/MTP_DETECTION.md("See Also") but that file does not exist in the PR or the repo. Either add the doc or drop the reference before merge. - Minor:
int(getattr(config, "num_nextn_predict_layers", 0))will raiseTypeErrorif the attribute is present butNone(some HF configs do this);getattr(..., 0) or 0would be safer.
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Per Slack request: posted simplification ideas. The inlined- and separate-file MTP paths duplicate the "walk index → stream matching tensors" logic with different I/O APIs and different fallback behavior, and the prefix-extraction concern is mixed into the separate-file scanner. Unifying both around a single _load_tensors_matching(model_dir, predicate) helper (using safe_open consistently, with single-file fallback) plus a pure _keys_to_prefixes(keys) extractor would shrink the diff meaningfully and close the no-index regression on the legacy path that CodeRabbit flagged. Also noting unresolved prior comments: stale MTP_DETECTION.md "See Also" pointer, int(None) TypeError if num_nextn_predict_layers is present-but-None, and the loader/orphan-split paths still have no fixture-based test (the PR body's mini-GLM-5.1 fixture would be a natural unit test).
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/examples/llm_ptq/test_example_utils.py (1)
112-133: ⚡ Quick winAdd one test for the sharded-index (
model.safetensors.index.json) path.Line 112 currently validates the no-index fallback/standalone-file flow well, but the PR also adds index-walk behavior and this branch is still unexercised. A focused test with two shards +
model.safetensors.index.jsonwould lock down the regression surface.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/examples/llm_ptq/test_example_utils.py` around lines 112 - 133, Add a new test mirroring test_load_mtp_weights_separate_standalone_file but exercising the sharded-index path: use _write_safetensors to create two shard files (e.g., model.safetensors.0000 and model.safetensors.0001) containing distinct keys (e.g., "mtp.fc.weight" in one shard and "mtp.layers.0.q_proj.weight" in the other), write a corresponding model.safetensors.index.json that maps those tensor names to the appropriate shard filenames, instantiate _FakeModel (as in the existing test) and call example_utils.load_mtp_weights(model, str(tmp_path)), then assert the returned prefixes and orphans include the expected "mtp" prefixes and the two orphan keys; reference example_utils.load_mtp_weights, _write_safetensors, _FakeModel and the index file name model.safetensors.index.json when locating code to modify.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/examples/llm_ptq/test_example_utils.py`:
- Around line 112-133: Add a new test mirroring
test_load_mtp_weights_separate_standalone_file but exercising the sharded-index
path: use _write_safetensors to create two shard files (e.g.,
model.safetensors.0000 and model.safetensors.0001) containing distinct keys
(e.g., "mtp.fc.weight" in one shard and "mtp.layers.0.q_proj.weight" in the
other), write a corresponding model.safetensors.index.json that maps those
tensor names to the appropriate shard filenames, instantiate _FakeModel (as in
the existing test) and call example_utils.load_mtp_weights(model,
str(tmp_path)), then assert the returned prefixes and orphans include the
expected "mtp" prefixes and the two orphan keys; reference
example_utils.load_mtp_weights, _write_safetensors, _FakeModel and the index
file name model.safetensors.index.json when locating code to modify.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f28d789e-29ef-4ffc-ba7a-a8a8d95f164b
📒 Files selected for processing (2)
examples/llm_ptq/example_utils.pytests/examples/llm_ptq/test_example_utils.py
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/_test_utils/examples/llm_ptq_example_utils.py (1)
30-30: ⚡ Quick winMove
__all__near the top-level declarations.Line 30 defines the module’s public surface, but this repo convention expects
__all__at the top of the module for discoverability and consistency.Proposed adjustment
import sys from _test_utils.examples.run_command import MODELOPT_ROOT +__all__ = ["example_utils"] _LLM_PTQ_DIR = MODELOPT_ROOT / "examples" / "llm_ptq" if str(_LLM_PTQ_DIR) not in sys.path: sys.path.insert(0, str(_LLM_PTQ_DIR)) import example_utils - -__all__ = ["example_utils"]As per coding guidelines: "Define the public API with
__all__at the top of each module and re-export viafrom .module import *in__init__.pyfiles".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/_test_utils/examples/llm_ptq_example_utils.py` at line 30, Move the module-level __all__ declaration from line 30 up to the top of the module immediately after the import block and before any class/function/variable declarations so the public API is declared at top-level; keep the symbol list unchanged (["example_utils"]) and ensure the module still exports that name via __all__.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/_test_utils/examples/llm_ptq_example_utils.py`:
- Line 30: Move the module-level __all__ declaration from line 30 up to the top
of the module immediately after the import block and before any
class/function/variable declarations so the public API is declared at top-level;
keep the symbol list unchanged (["example_utils"]) and ensure the module still
exports that name via __all__.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 577a7e47-01bd-44c4-961a-81f564ce59d6
📒 Files selected for processing (3)
examples/llm_ptq/example_utils.pytests/_test_utils/examples/llm_ptq_example_utils.pytests/examples/llm_ptq/test_example_utils.py
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review pass — the previous round of simplification work has all landed and is in good shape:
- 💬 Author replied "Done" on the unified scanner —
_load_tensors_matching+_keys_to_prefixescleanly separate I/O from prefix derivation, and thesafe_openmigration kills the legacyload_filewhole-shard read. - 💬 Author replied "No, just removed" on
MTP_DETECTION.md— confirmed gone from the docstring. - 💬
int(None)fix (or 0) applied; file handles wrapped inwith; no-index fallback restored viamodel_dir.glob("*.safetensors"). - 💬 Test coverage substantially expanded — all four conventions (inlined-orphaned, inlined-in-state-dict, separate-standalone, separate-indexed) plus the
num_nextn_predict_layers=Noneregression now have fixture tests. This addresses the prior "test coverage is thin" concern.
Flagging for human sign-off only because of the additional simplifications the operator asked about — none are correctness issues, all are nits:
-
predicate(key, shard_name)second arg looks dead. The only consumer ofshard_nameis"mtp" in shard_nameinload_mtp_weights.predicate. For every real MTP convention enumerated in the docstring, the"mtp" in keybranch already matches (Qwen3-Next, GLM-4.7) or the inlined-prefix branch matches (GLM-5.1, DeepSeek-V3). It's hard to construct a checkpoint where MTP weights would only be detected via shard filename. Consider dropping theshard_nameparameter from the predicate signature —_load_tensors_matchingbecomesCallable[[str], bool]and the predicate body collapses to two clauses. If you want to keep the defensive shard-name fallback, document the case it covers; right now it reads as belt-and-suspenders. -
Redundant
if inlined_tupleguard inpredicate.key.startswith(())already returnsFalse, soif inlined_tuple and key.startswith(inlined_tuple)simplifies tokey.startswith(inlined_tuple). -
Awkward conditional + operator-precedence reading hazard on the prefixes line.
prefixes = inlined_prefixes | _keys_to_prefixes(separate_keys) if tensors else set()parses as(A | B) if tensors else set()(correct), but the|-vs-ternary precedence is non-obvious. Cleaner as an early return:if not tensors: return [], {} prefixes = inlined_prefixes | _keys_to_prefixes(separate_keys)
This also drops the now-unreachable
if prefixes:guard around the print. -
Minor:
_keys_to_prefixesis only safe for separate-file keys (it would emit"model"as a prefix if fed an inlined key like"model.layers.78.eh_proj"). The current call site filters viaseparate_keys = [k for k in tensors if not k.startswith(inlined_tuple)]so it's fine, but a one-line docstring note ("caller must filter out inlined keys") would prevent future misuse. -
Docstring style:
get_inlined_mtp_prefixesand_keys_to_prefixesopen with lowercase verbs ("turn an HF config…", "invert separate-file MTP keys…") — convention is capitalized imperative.
None of these block correctness; all are tighten-up suggestions per the operator's "additional simplification" ask.
|
|
||
| def _load_tensors_matching( | ||
| model_dir: Path, predicate: Callable[[str, str | None], bool] | ||
| ) -> dict[str, torch.Tensor]: |
There was a problem hiding this comment.
Bot comment.
Minor: _keys_to_prefixes is only correct for separate-file keys — if fed an inlined key like "model.layers.78.eh_proj.weight" it would emit "model" as a top-level prefix, which is not what the exporter wants. The current caller filters via separate_keys = [k for k in tensors if not k.startswith(inlined_tuple)] so this is safe today; worth a one-line note in the docstring ("caller must filter inlined keys first") to prevent future misuse.
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review: previous round's findings are addressed.
- 💬 File-handle leaks at lines 333/415 — both now use
with open(...) as f:context managers. - 💬 No-index fallback (CodeRabbit major) —
_load_tensors_matchingnow globs*.safetensorsindependently of the index, restoring the standalone-shard path. - 💬 Author replied "No, just removed" —
MTP_DETECTION.md"See Also" reference is gone. - 💬
int(None)regression — fixed viaor 0, plus an explicitnum_nextn_predict_layers=Nonetest. - 💬 Author replied "Done" on unifying the two scanners —
_load_tensors_matching+_keys_to_prefixescleanly split I/O from prefix derivation;safe_openreplacesload_file. - 💬 Test coverage substantially expanded — 5 fixture-based tests across all four conventions (inlined-orphaned, inlined-in-state-dict, separate-standalone, separate-indexed) plus the None regression. The "thin coverage" concern is resolved.
- The prior round's tighten-up nits (dead
shard_namepredicate arg, redundantif inlined_tupleguard, ternary precedence on the prefixes line, caller-must-filter docstring note for_keys_to_prefixes) are all reflected in the current code. Only remaining items are docstring-capitalization nits, which are non-blocking.
End-to-end full-model verification is still pending per the PR body, but that's an operational gate the author has flagged explicitly and not something a unit-test review can substitute for.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
examples/llm_ptq/example_utils.py (2)
433-437: ⚡ Quick winUse rank-aware logging for this new status message.
A raw
print()here will fire on every rank during distributed runs. Please route it throughprint_rank_0/warn_rank_0instead.As per coding guidelines, "Use
print_rank_0orwarn_rank_0when possible to avoid noisy logs and guard shared side effects against race conditions between ranks in distributed processing".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/example_utils.py` around lines 433 - 437, Replace the raw print(...) that reports detected MTP tensors with a rank-aware logger: call print_rank_0 (or warn_rank_0 if this should be a warning) instead, passing the same formatted string that uses tensors, prefixes, and not_in_state_dict to build the message; locate the print call in example_utils.py (the f-string referencing len(tensors), sorted(prefixes), and len(not_in_state_dict)) and simply swap print(...) for print_rank_0(...) so only rank 0 emits the message during distributed runs.
421-429: ⚡ Quick winTighten separate-file detection to the documented
mtp.*namespace.
_keys_to_prefixes()and the support matrix both assume separate-file tensors are keyed under top-levelmtp.*. Matching any"mtp"substring can still pull in unrelated keys and derive overly broad exclusions from them.Suggested change
def predicate(key: str) -> bool: - return key.startswith(inlined_tuple) or "mtp" in key + return key.startswith(inlined_tuple) or key.startswith("mtp.")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_ptq/example_utils.py` around lines 421 - 429, The current predicate and separate_keys logic treat any key containing "mtp" as a separate-file tensor, which is too broad; change the predicate in the _load_tensors_matching call to test top-level mtp namespace (e.g., use key.startswith(inlined_tuple) or key.startswith("mtp.") ) and update separate_keys to exclude keys that start with the inlined_tuple or start with "mtp." (so separate_keys = [k for k in tensors if not k.startswith(inlined_tuple) and not k.startswith("mtp.")]); keep prefixes = inlined_prefixes | _keys_to_prefixes(separate_keys).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 433-437: Replace the raw print(...) that reports detected MTP
tensors with a rank-aware logger: call print_rank_0 (or warn_rank_0 if this
should be a warning) instead, passing the same formatted string that uses
tensors, prefixes, and not_in_state_dict to build the message; locate the print
call in example_utils.py (the f-string referencing len(tensors),
sorted(prefixes), and len(not_in_state_dict)) and simply swap print(...) for
print_rank_0(...) so only rank 0 emits the message during distributed runs.
- Around line 421-429: The current predicate and separate_keys logic treat any
key containing "mtp" as a separate-file tensor, which is too broad; change the
predicate in the _load_tensors_matching call to test top-level mtp namespace
(e.g., use key.startswith(inlined_tuple) or key.startswith("mtp.") ) and update
separate_keys to exclude keys that start with the inlined_tuple or start with
"mtp." (so separate_keys = [k for k in tensors if not
k.startswith(inlined_tuple) and not k.startswith("mtp.")]); keep prefixes =
inlined_prefixes | _keys_to_prefixes(separate_keys).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3d5f6dd1-e38a-4de5-8786-843faa15e577
📒 Files selected for processing (1)
examples/llm_ptq/example_utils.py
|
What does this PR do?
Type of change: Bug fix
Extends
load_mtp_weightsto detect inlined MTP layers — keysmodel.layers.{i}.*fori in [num_hidden, num_hidden + num_nextn_predict_layers)— in addition to the existingmtp.*separate-file convention.Bug.
load_mtp_weights()only matched the substring"mtp"in safetensors keys. GLM-5.1 (GlmMoeDsaForCausalLM) stores MTP atmodel.layers.78.*with nomtpsubstring, so detection returned([], {}),_mtp_layer_prefixeswas never set, and MTP tensors were silently dropped from the exported safetensors (had to be re-added manually).Detection.
config.num_nextn_predict_layers(the model's own declaration of how many MTP layers exist).model.layers.{i}fori in range(num_hidden, num_hidden + num_nextn).safe_open(walksmodel.safetensors.index.jsonif present,else falls back to the single shard).model.state_dict()has a slot for them:model.state_dict()→model.load_state_dict(..., strict=False)(DeepSeek-V3 case: HF instantiates the extra layers).model.state_dict()→ returned asnot_in_state_dictso the exporter routes them throughextra_state_dict(GLM-5.1case:
GlmMoeDsaModelin transformers ≥5.7 only buildsnum_hiddendecoders, leaving MTP keys orphaned atfrom_pretrainedtime).The returned prefixes flow into the existing plumbing —
_mtp_layer_prefixes→quant_cfgdisable +quantization_config.exclude_modules— unchanged.Usage
# Add a code snippet demonstrating how to use thisTesting
Verified end-to-end on a mini GLM-5.1 fixture (4 hidden layers + 1 inlined MTP at
model.layers.4, 7 synthesized MTP tensors mirroring the full GLM-5.1 layout)To be verified with full model
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit