Skip to content

[NVBug 6102977] Disable use_cache during export dummy forward for custom configs#1402

Merged
meenchen merged 1 commit intomainfrom
weimingc/nvbug_6102977_v2
May 6, 2026
Merged

[NVBug 6102977] Disable use_cache during export dummy forward for custom configs#1402
meenchen merged 1 commit intomainfrom
weimingc/nvbug_6102977_v2

Conversation

@meenchen
Copy link
Copy Markdown
Contributor

@meenchen meenchen commented May 6, 2026

The export step's collect_shared_input_modules() runs a dummy forward pass to trace shared input modules for resmoothing/requantization. This forward call hits the same AttributeError as PR #1324 on configs that don't assign use_cache (e.g., stepfun-ai/Step-3.5-Flash's Step3p5Config):

AttributeError: 'Step3p5Config' object has no attribute 'use_cache'

Wrap the dummy forward with the _disable_use_cache context manager so it sets/restores config.use_cache around the call, mirroring the calibration fix from #1324. Covers all callsites since both unified_export_hf and plugins/vllm_fakequant_hf funnel through collect_shared_input_modules.

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • Bug Fixes
    • Improved model export process by optimizing cache handling during profiling operations.

The export step's collect_shared_input_modules() runs a dummy forward pass to
trace shared input modules for resmoothing/requantization. This forward call
hits the same AttributeError as PR #1324 on configs that don't assign
use_cache (e.g., stepfun-ai/Step-3.5-Flash's Step3p5Config):

  AttributeError: 'Step3p5Config' object has no attribute 'use_cache'

Wrap the dummy forward with the _disable_use_cache context manager so it
sets/restores config.use_cache around the call, mirroring the calibration
fix from #1324. Covers all callsites since both unified_export_hf and
plugins/vllm_fakequant_hf funnel through collect_shared_input_modules.

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
@meenchen meenchen requested a review from a team as a code owner May 6, 2026 21:58
@meenchen meenchen requested a review from jingyu-ml May 6, 2026 21:58
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 614b1991-bc52-417f-9687-8a50c74c8c32

📥 Commits

Reviewing files that changed from the base of the PR and between 1c2085f and 52d7a1e.

📒 Files selected for processing (1)
  • modelopt/torch/export/unified_export_hf.py

📝 Walkthrough

Walkthrough

A single import is added for a cache-disabling utility, and the dummy forward probe in collect_shared_input_modules is wrapped with an additional context manager to disable dataset use_cache during module input collection.

Changes

Cache Disabling During Input Module Collection Probe

Layer / File(s) Summary
Utility Import
modelopt/torch/export/unified_export_hf.py
_disable_use_cache is imported from modelopt.torch.utils.dataset_utils.
Probe Context Wrapping
modelopt/torch/export/unified_export_hf.py
Dummy forward probe is nested within _disable_use_cache(model) context manager to prevent cache use during input module collection.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: disabling use_cache during export dummy forward for custom configs, which directly addresses the AttributeError fix mentioned in the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns found. PR adds safe context manager usage with no torch.load, numpy.load, trust_remote_code, eval, exec, nosec, or new dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch weimingc/nvbug_6102977_v2

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-06 23:13 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.82%. Comparing base (097293b) to head (52d7a1e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1402      +/-   ##
==========================================
+ Coverage   76.73%   76.82%   +0.09%     
==========================================
  Files         476      476              
  Lines       51306    51307       +1     
==========================================
+ Hits        39368    39419      +51     
+ Misses      11938    11888      -50     
Flag Coverage Δ
examples 41.81% <100.00%> (+2.63%) ⬆️
gpu 59.84% <100.00%> (-0.58%) ⬇️
regression 15.20% <100.00%> (+0.07%) ⬆️
unit 52.54% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meenchen meenchen requested a review from cjluo-nv May 6, 2026 22:40
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Tiny, well-targeted bug fix (+5 -1). Reuses the existing _disable_use_cache context manager from dataset_utils to wrap the dummy forward pass in collect_shared_input_modules, mirroring the calibration fix from PR #1324. The context manager correctly handles configs that lack use_cache (sets it to False, restores/deletes on exit). No new code to speak of — just adding one more context manager to an existing with block.

@meenchen meenchen merged commit 579fc6c into main May 6, 2026
49 checks passed
@meenchen meenchen deleted the weimingc/nvbug_6102977_v2 branch May 6, 2026 23:12
@meenchen meenchen self-assigned this May 6, 2026
@meenchen meenchen added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 6, 2026
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Tiny, correct bug fix (+5 -1). Reuses the existing _disable_use_cache context manager to wrap the dummy forward in collect_shared_input_modules, matching the established pattern from the calibration fix in PR #1324. Handles configs lacking use_cache correctly. No new code, no design concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants