Skip to content

Ensure removal of temp files on error in ONNX INT4 quantization#1359

Merged
vishalpandya1990 merged 2 commits intomainfrom
vipandya/fix_temp_file_int4_issue
Apr 30, 2026
Merged

Ensure removal of temp files on error in ONNX INT4 quantization#1359
vishalpandya1990 merged 2 commits intomainfrom
vipandya/fix_temp_file_int4_issue

Conversation

@vishalpandya1990
Copy link
Copy Markdown
Contributor

@vishalpandya1990 vishalpandya1990 commented Apr 28, 2026

What does this PR do?

Type of change: Minor bug fix

  • Put quantization steps inside try-finally to ensure removal of temp files on error in ONNX INT4 quantization.
  • To avoid redundancy between awq_lite() and awq_clip() methods, created a utility _remove_augmented_onnx() for exception-handling based removal of augmented onnx file and its data file.

Testing

  • Locally performed ONNX INT4 awq-lite and awq-clip quantization with Llama 1B model.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • Refactor
    • Improved reliability of the quantization pipeline by ensuring temporary conversion artifacts are always removed, making cleanup more robust.
    • Consolidated handling of external-data companions and added safer deletion behavior that logs failures instead of raising errors.
    • Ensured consistent session teardown and forced memory collection to reduce resource leakage and intermittent errors during model conversion.

@vishalpandya1990 vishalpandya1990 requested a review from a team as a code owner April 28, 2026 08:24
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

Adds a _remove_augmented_onnx helper to remove temp augmented ONNX files and optional external-data companions; refactors _quantize_awq_clip and _quantize_awq_lite to use try/finally for deterministic teardown: session references are cleared, gc.collect() is called in finally, and augmented artifacts are always removed via the new helper. (50 words)

Changes

Cohort / File(s) Summary
AWQ quantization & cleanup
modelopt/onnx/quantization/int4.py
Adds _remove_augmented_onnx helper; wraps _quantize_awq_clip and _quantize_awq_lite in try/finally; replaces ad-hoc os.remove logic with the helper; ensures external-data companion removal when use_external_data_format is enabled; clears inference session references and calls gc.collect() from finally.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title directly reflects the main change: adding try-finally blocks to ensure temporary files are removed during error handling in ONNX INT4 quantization, which aligns with the code modifications.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR introduces no security anti-patterns. No unsafe deserialization, remote code execution flags, eval/exec calls, # nosec bypasses, or new non-permissive dependencies found.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch vipandya/fix_temp_file_int4_issue

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/int4.py`:
- Around line 541-543: In both _quantize_awq_clip and _quantize_awq_lite,
initialize session = None before the try that calls
create_inference_session(augmented_onnx_path, calibration_eps,
input_shapes_profile), and in the finally block explicitly delete the session
(del session; session = None) and call gc.collect() to release file locks; also
change any logger.warn(...) calls to logger.warning(...). Replace the combined
try/except around the two os.remove(...) calls with independent removals: for
each temp artifact call os.remove(...) inside its own try that only catches
FileNotFoundError (silently ignore) and logs any other OSError with contextual
info (filename and function name) so the second file still gets attempted even
if the first fails.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 24ad79a7-275f-4e8e-bc64-2b285661c188

📥 Commits

Reviewing files that changed from the base of the PR and between 6e08b13 and 540cb9e.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/int4.py

Comment thread modelopt/onnx/quantization/int4.py
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-30 03:52 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 67.42424% with 86 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.88%. Comparing base (6e08b13) to head (ce5214b).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/int4.py 67.42% 86 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1359      +/-   ##
==========================================
- Coverage   76.93%   76.88%   -0.05%     
==========================================
  Files         471      471              
  Lines       50401    50414      +13     
==========================================
- Hits        38777    38762      -15     
- Misses      11624    11652      +28     
Flag Coverage Δ
examples 41.58% <0.37%> (+0.91%) ⬆️
gpu 59.55% <61.74%> (-0.61%) ⬇️
unit 52.73% <64.77%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: vipandya <vipandya@nvidia.com>
…nd data file

Signed-off-by: vipandya <vipandya@nvidia.com>
@vishalpandya1990 vishalpandya1990 force-pushed the vipandya/fix_temp_file_int4_issue branch from 76d393e to ce5214b Compare April 28, 2026 11:52
@vishalpandya1990 vishalpandya1990 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 28, 2026
Copy link
Copy Markdown
Contributor

@ajrasane ajrasane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good and CI is green, but consider adding a regression test for the cleanup-on-error path — that's the actual behavior this PR introduces, and right now nothing exercises it (codecov shows the new finally branches are uncovered).

A small unit test in tests/unit/onnx/quantization/test_quantize_zint4.py would do it: monkeypatch modelopt.onnx.quantization.int4.create_inference_session to raise, call _quantize_awq_clip / _quantize_awq_lite (or the public quantize entry point with awq_clip / awq_lite), assert the exception propagates, and assert no *.onnx / *.onnx_data files are left behind in tempfile.gettempdir() matching the augmented-model pattern. Bonus: parametrize over use_external_data_format=True/False so both helper branches are exercised.

Without this, future refactors of the augmented-graph flow could silently regress the leak this PR is fixing.

@vishalpandya1990
Copy link
Copy Markdown
Contributor Author

The fix looks good and CI is green, but consider adding a regression test for the cleanup-on-error path — that's the actual behavior this PR introduces, and right now nothing exercises it (codecov shows the new finally branches are uncovered).

A small unit test in tests/unit/onnx/quantization/test_quantize_zint4.py would do it: monkeypatch modelopt.onnx.quantization.int4.create_inference_session to raise, call _quantize_awq_clip / _quantize_awq_lite (or the public quantize entry point with awq_clip / awq_lite), assert the exception propagates, and assert no *.onnx / *.onnx_data files are left behind in tempfile.gettempdir() matching the augmented-model pattern. Bonus: parametrize over use_external_data_format=True/False so both helper branches are exercised.

Without this, future refactors of the augmented-graph flow could silently regress the leak this PR is fixing.

Yes, will follow-up on it separately.

@vishalpandya1990 vishalpandya1990 merged commit a492fa9 into main Apr 30, 2026
47 checks passed
@vishalpandya1990 vishalpandya1990 deleted the vipandya/fix_temp_file_int4_issue branch April 30, 2026 03:52
@kevalmorabia97 kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc cherry-pick-done Added by bot once PR is cherry-picked to the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants