[Bug fix] Fake quantized model save after HF accelerate hooks are added by realAsma · Pull Request #906 · NVIDIA/Model-Optimizer

realAsma · 2026-02-18T21:09:57Z

What does this PR do?

Type of change: Bug fix

Overview: Fix AttributeError: Can't get local object 'add_hook_to_module.<locals>.new_forward' when saving a quantized model a second time after restoring it with device_map="auto".

When a model is loaded with device_map="auto", accelerate's add_hook_to_module patches every submodule (including TensorQuantizer instances) and injects three instance attributes: _hf_hook, _old_forward, and forward (a functools.partial wrapping a local function). These are not picklable and were leaking into the modelopt state dict collected by get_modelopt_state(), causing torch.save to fail.

This PR adds the three accelerate-injected attributes to TensorQuantizer._skip_properties_for_save_restore so they are excluded from the serialized state, matching the existing pattern used for HuggingFace and DeepSpeed attributes.

Usage

mto.enable_huggingface_checkpointing()

# Quantize and save
model = AutoModelForCausalLM.from_pretrained(name, device_map="auto")
model = mtq.quantize(model, mtq.FP8_DEFAULT_CFG, forward_loop=forward_loop)
model.save_pretrained(save_dir)

# Restore and save again (this previously failed)
model2 = AutoModelForCausalLM.from_pretrained(save_dir, device_map="auto")
model2.save_pretrained(save_dir_round2)  # now works

Testing

Added unit test test_tensor_quantizer_modelopt_state_with_accelerate_hook in tests/unit/torch/quantization/plugins/test_accelerate.py that verifies accelerate hook attributes are excluded from modelopt state and the state dict remains picklable.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes — only adds entries to a skip set; existing saved checkpoints are unaffected.
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: No (internal fix, no API change)
Did you update Changelog?: No

Additional Information

The root cause is in accelerate's add_hook_to_module, which defines new_forward as a local function and binds it via functools.partial onto module.forward. Since local functions cannot be pickled, any TensorQuantizer that has been hooked by accelerate becomes unserializable unless these attributes are excluded.

Summary by CodeRabbit

Bug Fixes
- Enhanced compatibility with accelerate library by excluding framework-specific hooks and attributes from model state serialization, preventing issues during save/restore operations.
Tests
- Added test to validate that accelerate-related attributes are properly excluded from model state and that the state remains picklable.
Public API
- TensorQuantizer is now publicly exported.

coderabbitai · 2026-02-18T21:10:17Z

📝 Walkthrough

Walkthrough

Updates TensorQuantizer to exclude HuggingFace/accelerate-related attributes (_hf_hook, _old_forward, forward) from save/restore metadata. Adds test verifying these attributes are excluded and model state remains picklable. Publicly exports TensorQuantizer from the quantization module.

Changes

Cohort / File(s)	Summary
TensorQuantizer Save/Restore Configuration `modelopt/torch/quantization/nn/modules/tensor_quantizer.py`	Extends `_skip_properties_for_save_restore` set with three new attributes: `_hf_hook`, `_old_forward`, and `forward` to exclude accelerate/HuggingFace hook metadata from serialization.
Accelerate Integration Tests & Public API `tests/unit/torch/quantization/plugins/test_accelerate.py`	Adds new test `test_tensor_quantizer_modelopt_state_with_accelerate_hook` to verify accelerate hooks are excluded from modelopt state and pickling works correctly. Updates imports and exports TensorQuantizer publicly from `modelopt.torch.quantization.nn`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies a bug fix related to saving fake quantized models after HuggingFace accelerate hooks are added, accurately reflecting the main change in the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch asma/TQ-save-fix-accelerate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: realAsma <akuriparambi@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

tests/unit/torch/quantization/plugins/test_accelerate.py (1)

58-82: Consider adding a round-trip save/restore test and properties_only=True coverage.

The current test validates exclusion and picklability, but doesn't verify that set_from_modelopt_state still correctly restores a quantizer when accelerate hooks are active, nor that the properties_only=True variant is equally clean. Both gaps are low-risk (the skip set is shared by all code paths), but explicit coverage would prevent regressions.

♻️ Suggested additions

    # Also verify properties_only=True path
    state_props_only = tq.get_modelopt_state(properties_only=True)
    leaked_props = accelerate_attrs & state_props_only.keys()
    assert not leaked_props, f"Accelerate attributes leaked (properties_only=True): {leaked_props}"
    pickle.dumps(state_props_only)

    # Round-trip: restore a fresh TQ from the saved state
    tq2 = TensorQuantizer()
    add_hook_to_module(tq2, ModelHook())
    tq2.set_from_modelopt_state(state)
    assert tq2.num_bits == tq.num_bits

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/plugins/test_accelerate.py` around lines 58 -
82, Add tests to assert get_modelopt_state(properties_only=True) also omits
accelerate-injected attributes and is picklable, and perform a round-trip
save/restore to ensure set_from_modelopt_state works when hooks are present:
call TensorQuantizer.get_modelopt_state(properties_only=True) and verify it does
not contain "_hf_hook", "_old_forward", or "forward" and that pickle.dumps
succeeds, then create a fresh TensorQuantizer, apply add_hook_to_module, call
set_from_modelopt_state(state) using the previously saved state, and assert key
properties (e.g., num_bits) match the original; reference TensorQuantizer,
get_modelopt_state, set_from_modelopt_state, properties_only, and
add_hook_to_module to locate the code under test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/torch/quantization/plugins/test_accelerate.py`:
- Around line 58-82: Add tests to assert
get_modelopt_state(properties_only=True) also omits accelerate-injected
attributes and is picklable, and perform a round-trip save/restore to ensure
set_from_modelopt_state works when hooks are present: call
TensorQuantizer.get_modelopt_state(properties_only=True) and verify it does not
contain "_hf_hook", "_old_forward", or "forward" and that pickle.dumps succeeds,
then create a fresh TensorQuantizer, apply add_hook_to_module, call
set_from_modelopt_state(state) using the previously saved state, and assert key
properties (e.g., num_bits) match the original; reference TensorQuantizer,
get_modelopt_state, set_from_modelopt_state, properties_only, and
add_hook_to_module to locate the code under test.

codecov · 2026-02-18T21:25:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.54%. Comparing base (9e38041) to head (27a0fb6).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #906   +/-   ##
=======================================
  Coverage   73.54%   73.54%           
=======================================
  Files         205      205           
  Lines       22000    22000           
=======================================
  Hits        16179    16179           
  Misses       5821     5821

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

realAsma requested a review from a team as a code owner February 18, 2026 21:09

realAsma requested a review from mxinO February 18, 2026 21:09

[Bug fix] Fake quantized model save after HF accelerate hooks are added

27a0fb6

Signed-off-by: realAsma <akuriparambi@nvidia.com>

realAsma force-pushed the asma/TQ-save-fix-accelerate branch from f084240 to 27a0fb6 Compare February 18, 2026 21:12

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

realAsma requested review from Fridah-nv, cjluo-nv, meenchen and sugunav14 February 18, 2026 21:22

cjluo-nv approved these changes Feb 18, 2026

View reviewed changes

realAsma merged commit c4b662f into main Feb 19, 2026
37 checks passed

realAsma deleted the asma/TQ-save-fix-accelerate branch February 19, 2026 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug fix] Fake quantized model save after HF accelerate hooks are added#906

[Bug fix] Fake quantized model save after HF accelerate hooks are added#906
realAsma merged 1 commit intomainfrom
asma/TQ-save-fix-accelerate

realAsma commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

realAsma commented Feb 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

realAsma commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

codecov bot commented Feb 18, 2026 •

edited

Loading