Skip to content

[Bug fix] Fake quantized model save after HF accelerate hooks are added#906

Merged
realAsma merged 1 commit intomainfrom
asma/TQ-save-fix-accelerate
Feb 19, 2026
Merged

[Bug fix] Fake quantized model save after HF accelerate hooks are added#906
realAsma merged 1 commit intomainfrom
asma/TQ-save-fix-accelerate

Conversation

@realAsma
Copy link
Contributor

@realAsma realAsma commented Feb 18, 2026

What does this PR do?

Type of change: Bug fix

Overview: Fix AttributeError: Can't get local object 'add_hook_to_module.<locals>.new_forward' when saving a quantized model a second time after restoring it with device_map="auto".

When a model is loaded with device_map="auto", accelerate's add_hook_to_module patches every submodule (including TensorQuantizer instances) and injects three instance attributes: _hf_hook, _old_forward, and forward (a functools.partial wrapping a local function). These are not picklable and were leaking into the modelopt state dict collected by get_modelopt_state(), causing torch.save to fail.

This PR adds the three accelerate-injected attributes to TensorQuantizer._skip_properties_for_save_restore so they are excluded from the serialized state, matching the existing pattern used for HuggingFace and DeepSpeed attributes.

Usage

mto.enable_huggingface_checkpointing()

# Quantize and save
model = AutoModelForCausalLM.from_pretrained(name, device_map="auto")
model = mtq.quantize(model, mtq.FP8_DEFAULT_CFG, forward_loop=forward_loop)
model.save_pretrained(save_dir)

# Restore and save again (this previously failed)
model2 = AutoModelForCausalLM.from_pretrained(save_dir, device_map="auto")
model2.save_pretrained(save_dir_round2)  # now works

Testing

  • Added unit test test_tensor_quantizer_modelopt_state_with_accelerate_hook in tests/unit/torch/quantization/plugins/test_accelerate.py that verifies accelerate hook attributes are excluded from modelopt state and the state dict remains picklable.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes — only adds entries to a skip set; existing saved checkpoints are unaffected.
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No (internal fix, no API change)
  • Did you update Changelog?: No

Additional Information

The root cause is in accelerate's add_hook_to_module, which defines new_forward as a local function and binds it via functools.partial onto module.forward. Since local functions cannot be pickled, any TensorQuantizer that has been hooked by accelerate becomes unserializable unless these attributes are excluded.

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced compatibility with accelerate library by excluding framework-specific hooks and attributes from model state serialization, preventing issues during save/restore operations.
  • Tests

    • Added test to validate that accelerate-related attributes are properly excluded from model state and that the state remains picklable.
  • Public API

    • TensorQuantizer is now publicly exported.

@realAsma realAsma requested a review from a team as a code owner February 18, 2026 21:09
@realAsma realAsma requested a review from mxinO February 18, 2026 21:09
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

📝 Walkthrough

Walkthrough

Updates TensorQuantizer to exclude HuggingFace/accelerate-related attributes (_hf_hook, _old_forward, forward) from save/restore metadata. Adds test verifying these attributes are excluded and model state remains picklable. Publicly exports TensorQuantizer from the quantization module.

Changes

Cohort / File(s) Summary
TensorQuantizer Save/Restore Configuration
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
Extends _skip_properties_for_save_restore set with three new attributes: _hf_hook, _old_forward, and forward to exclude accelerate/HuggingFace hook metadata from serialization.
Accelerate Integration Tests & Public API
tests/unit/torch/quantization/plugins/test_accelerate.py
Adds new test test_tensor_quantizer_modelopt_state_with_accelerate_hook to verify accelerate hooks are excluded from modelopt state and pickling works correctly. Updates imports and exports TensorQuantizer publicly from modelopt.torch.quantization.nn.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies a bug fix related to saving fake quantized models after HuggingFace accelerate hooks are added, accurately reflecting the main change in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch asma/TQ-save-fix-accelerate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: realAsma <akuriparambi@nvidia.com>
@realAsma realAsma force-pushed the asma/TQ-save-fix-accelerate branch from f084240 to 27a0fb6 Compare February 18, 2026 21:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/torch/quantization/plugins/test_accelerate.py (1)

58-82: Consider adding a round-trip save/restore test and properties_only=True coverage.

The current test validates exclusion and picklability, but doesn't verify that set_from_modelopt_state still correctly restores a quantizer when accelerate hooks are active, nor that the properties_only=True variant is equally clean. Both gaps are low-risk (the skip set is shared by all code paths), but explicit coverage would prevent regressions.

♻️ Suggested additions
    # Also verify properties_only=True path
    state_props_only = tq.get_modelopt_state(properties_only=True)
    leaked_props = accelerate_attrs & state_props_only.keys()
    assert not leaked_props, f"Accelerate attributes leaked (properties_only=True): {leaked_props}"
    pickle.dumps(state_props_only)

    # Round-trip: restore a fresh TQ from the saved state
    tq2 = TensorQuantizer()
    add_hook_to_module(tq2, ModelHook())
    tq2.set_from_modelopt_state(state)
    assert tq2.num_bits == tq.num_bits
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/plugins/test_accelerate.py` around lines 58 -
82, Add tests to assert get_modelopt_state(properties_only=True) also omits
accelerate-injected attributes and is picklable, and perform a round-trip
save/restore to ensure set_from_modelopt_state works when hooks are present:
call TensorQuantizer.get_modelopt_state(properties_only=True) and verify it does
not contain "_hf_hook", "_old_forward", or "forward" and that pickle.dumps
succeeds, then create a fresh TensorQuantizer, apply add_hook_to_module, call
set_from_modelopt_state(state) using the previously saved state, and assert key
properties (e.g., num_bits) match the original; reference TensorQuantizer,
get_modelopt_state, set_from_modelopt_state, properties_only, and
add_hook_to_module to locate the code under test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/torch/quantization/plugins/test_accelerate.py`:
- Around line 58-82: Add tests to assert
get_modelopt_state(properties_only=True) also omits accelerate-injected
attributes and is picklable, and perform a round-trip save/restore to ensure
set_from_modelopt_state works when hooks are present: call
TensorQuantizer.get_modelopt_state(properties_only=True) and verify it does not
contain "_hf_hook", "_old_forward", or "forward" and that pickle.dumps succeeds,
then create a fresh TensorQuantizer, apply add_hook_to_module, call
set_from_modelopt_state(state) using the previously saved state, and assert key
properties (e.g., num_bits) match the original; reference TensorQuantizer,
get_modelopt_state, set_from_modelopt_state, properties_only, and
add_hook_to_module to locate the code under test.

@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.54%. Comparing base (9e38041) to head (27a0fb6).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #906   +/-   ##
=======================================
  Coverage   73.54%   73.54%           
=======================================
  Files         205      205           
  Lines       22000    22000           
=======================================
  Hits        16179    16179           
  Misses       5821     5821           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@realAsma realAsma merged commit c4b662f into main Feb 19, 2026
37 checks passed
@realAsma realAsma deleted the asma/TQ-save-fix-accelerate branch February 19, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments