Ensure removal of temp files on error in ONNX INT4 quantization by vishalpandya1990 · Pull Request #1359 · NVIDIA/Model-Optimizer

vishalpandya1990 · 2026-04-28T08:24:25Z

What does this PR do?

Type of change: Minor bug fix

Put quantization steps inside try-finally to ensure removal of temp files on error in ONNX INT4 quantization.
To avoid redundancy between awq_lite() and awq_clip() methods, created a utility _remove_augmented_onnx() for exception-handling based removal of augmented onnx file and its data file.

Testing

Locally performed ONNX INT4 awq-lite and awq-clip quantization with Llama 1B model.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Refactor
- Improved reliability of the quantization pipeline by ensuring temporary conversion artifacts are always removed, making cleanup more robust.
- Consolidated handling of external-data companions and added safer deletion behavior that logs failures instead of raising errors.
- Ensured consistent session teardown and forced memory collection to reduce resource leakage and intermittent errors during model conversion.

coderabbitai · 2026-04-28T08:24:38Z

📝 Walkthrough

Walkthrough

Adds a _remove_augmented_onnx helper to remove temp augmented ONNX files and optional external-data companions; refactors _quantize_awq_clip and _quantize_awq_lite to use try/finally for deterministic teardown: session references are cleared, gc.collect() is called in finally, and augmented artifacts are always removed via the new helper. (50 words)

Changes

Cohort / File(s)	Summary
AWQ quantization & cleanup `modelopt/onnx/quantization/int4.py`	Adds `_remove_augmented_onnx` helper; wraps `_quantize_awq_clip` and `_quantize_awq_lite` in `try/finally`; replaces ad-hoc `os.remove` logic with the helper; ensures external-data companion removal when `use_external_data_format` is enabled; clears inference `session` references and calls `gc.collect()` from `finally`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly reflects the main change: adding try-finally blocks to ensure temporary files are removed during error handling in ONNX INT4 quantization, which aligns with the code modifications.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR introduces no security anti-patterns. No unsafe deserialization, remote code execution flags, eval/exec calls, # nosec bypasses, or new non-permissive dependencies found.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch vipandya/fix_temp_file_int4_issue

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/int4.py`:
- Around line 541-543: In both _quantize_awq_clip and _quantize_awq_lite,
initialize session = None before the try that calls
create_inference_session(augmented_onnx_path, calibration_eps,
input_shapes_profile), and in the finally block explicitly delete the session
(del session; session = None) and call gc.collect() to release file locks; also
change any logger.warn(...) calls to logger.warning(...). Replace the combined
try/except around the two os.remove(...) calls with independent removals: for
each temp artifact call os.remove(...) inside its own try that only catches
FileNotFoundError (silently ignore) and logs any other OSError with contextual
info (filename and function name) so the second file still gets attempted even
if the first fails.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 24ad79a7-275f-4e8e-bc64-2b285661c188

📥 Commits

Reviewing files that changed from the base of the PR and between 6e08b13 and 540cb9e.

📒 Files selected for processing (1)

modelopt/onnx/quantization/int4.py

github-actions · 2026-04-28T08:30:38Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-30 03:52 UTC

codecov · 2026-04-28T08:38:29Z

Codecov Report

❌ Patch coverage is 67.42424% with 86 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.88%. Comparing base (6e08b13) to head (ce5214b).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/int4.py	67.42%	86 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1359      +/-   ##
==========================================
- Coverage   76.93%   76.88%   -0.05%     
==========================================
  Files         471      471              
  Lines       50401    50414      +13     
==========================================
- Hits        38777    38762      -15     
- Misses      11624    11652      +28

Flag	Coverage Δ
examples	`41.58% <0.37%> (+0.91%)`	⬆️
gpu	`59.55% <61.74%> (-0.61%)`	⬇️
unit	`52.73% <64.77%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: vipandya <vipandya@nvidia.com>

…nd data file Signed-off-by: vipandya <vipandya@nvidia.com>

ajrasane

The fix looks good and CI is green, but consider adding a regression test for the cleanup-on-error path — that's the actual behavior this PR introduces, and right now nothing exercises it (codecov shows the new finally branches are uncovered).

A small unit test in tests/unit/onnx/quantization/test_quantize_zint4.py would do it: monkeypatch modelopt.onnx.quantization.int4.create_inference_session to raise, call _quantize_awq_clip / _quantize_awq_lite (or the public quantize entry point with awq_clip / awq_lite), assert the exception propagates, and assert no *.onnx / *.onnx_data files are left behind in tempfile.gettempdir() matching the augmented-model pattern. Bonus: parametrize over use_external_data_format=True/False so both helper branches are exercised.

Without this, future refactors of the augmented-graph flow could silently regress the leak this PR is fixing.

vishalpandya1990 · 2026-04-30T03:43:11Z

The fix looks good and CI is green, but consider adding a regression test for the cleanup-on-error path — that's the actual behavior this PR introduces, and right now nothing exercises it (codecov shows the new finally branches are uncovered).

A small unit test in tests/unit/onnx/quantization/test_quantize_zint4.py would do it: monkeypatch modelopt.onnx.quantization.int4.create_inference_session to raise, call _quantize_awq_clip / _quantize_awq_lite (or the public quantize entry point with awq_clip / awq_lite), assert the exception propagates, and assert no *.onnx / *.onnx_data files are left behind in tempfile.gettempdir() matching the augmented-model pattern. Bonus: parametrize over use_external_data_format=True/False so both helper branches are exercised.

Without this, future refactors of the augmented-graph flow could silently regress the leak this PR is fixing.

Yes, will follow-up on it separately.

vishalpandya1990 requested a review from a team as a code owner April 28, 2026 08:24

vishalpandya1990 requested a review from ajrasane April 28, 2026 08:24

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread modelopt/onnx/quantization/int4.py

vishalpandya1990 added 2 commits April 28, 2026 11:52

onnx int4 - remove temp files even on exception

42f3904

Signed-off-by: vipandya <vipandya@nvidia.com>

clear session before onnx file removal, handle removal of both onnx a…

ce5214b

…nd data file Signed-off-by: vipandya <vipandya@nvidia.com>

vishalpandya1990 force-pushed the vipandya/fix_temp_file_int4_issue branch from 76d393e to ce5214b Compare April 28, 2026 11:52

vishalpandya1990 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 28, 2026

ajrasane approved these changes Apr 29, 2026

View reviewed changes

vishalpandya1990 merged commit a492fa9 into main Apr 30, 2026
47 checks passed

vishalpandya1990 deleted the vipandya/fix_temp_file_int4_issue branch April 30, 2026 03:52

vishalpandya1990 mentioned this pull request May 3, 2026

Add unit test for checking any leak of temporary augmented onnx files, on exception during ONNX INT4 AWQ quantization #1383

Open

kevalmorabia97 mentioned this pull request May 4, 2026

[Cherry-pick] PRs #1352 #1351 #1330 #1354 #1355 #1360 #1342 #1324 #1340 #1368 #1373 #1359 #1361 #1325 #1369 #1370 #1371 #1385

Open

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure removal of temp files on error in ONNX INT4 quantization#1359

Ensure removal of temp files on error in ONNX INT4 quantization#1359
vishalpandya1990 merged 2 commits intomainfrom
vipandya/fix_temp_file_int4_issue

vishalpandya1990 commented Apr 28, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

ajrasane left a comment

Uh oh!

vishalpandya1990 commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vishalpandya1990 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ajrasane left a comment

Choose a reason for hiding this comment

Uh oh!

vishalpandya1990 commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vishalpandya1990 commented Apr 28, 2026 •

edited

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented Apr 28, 2026 •

edited

Loading

codecov Bot commented Apr 28, 2026 •

edited

Loading