fix exception to avoid memory issue #1679

jiqing-feng · 2025-08-14T02:08:45Z

The exception e is not just an ImportError instance — it also stores a reference to a traceback object.
The traceback contains the entire call stack at the time of the error.
Keeping the exception as a global value will lead to the object not being released, for example:

import sys
import torch
import gc
from diffusers import (
    FluxControlPipeline,
    FluxTransformer2DModel,
    GGUFQuantizationConfig,
)


torch_device = 0
accelerator_module = None
if torch.cuda.is_available():
    accelerator_module = torch.cuda
elif torch.xpu.is_available():
    accelerator_module = torch.xpu


def test_lora_loading():
    ckpt_path = "https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q2_K.gguf"
    transformer = FluxTransformer2DModel.from_single_file(
        ckpt_path,
        quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
        torch_dtype=torch.bfloat16,
    ).to(torch_device)
    pipe = FluxControlPipeline.from_pretrained(
        "black-forest-labs/FLUX.1-dev",
        transformer=transformer,
        torch_dtype=torch.bfloat16,
    ).to(torch_device)
    print(f"transformer ref before lora loading: {sys.getrefcount(transformer)}")
    pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora")
    print(f"transformer ref after lora loading: {sys.getrefcount(transformer)}")

    del transformer
    del pipe
    gc.collect()
    accelerator_module.empty_cache()

print(f"memory before running: {accelerator_module.max_memory_allocated()}")
test_lora_loading()
print(f"memory after running: {accelerator_module.max_memory_allocated()}")
accelerator_module.reset_peak_memory_stats()
print(f"memory after reset_peak_memory_stats: {accelerator_module.max_memory_allocated()}")

Output before this PR:

memory before running: 0
transformer ref before lora loading: 3
transformer ref after lora loading: 12
memory after running: 15408069632
memory after reset_peak_memory_stats: 15408069632

We can keep the string of the exception instead of the exception object. In this case, we only store a string instead of the full traceback as the global value.

Output after this PR:

memory before running: 0
transformer ref before lora loading: 3
transformer ref after lora loading: 3
memory after running: 15408069632
memory after reset_peak_memory_stats: 0

jiqing-feng · 2025-08-14T02:09:57Z

Hi @Qubitium . Would you please review this PR? Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Qubitium · 2025-08-15T06:23:49Z

@jiqing-feng Thank you for the memory leak fix!

jiqing-feng requested a review from Qubitium August 14, 2025 02:09

fix exception

966184c

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Qubitium merged commit 37a0d9f into ModelCloud:main Aug 15, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix exception to avoid memory issue #1679

fix exception to avoid memory issue #1679

Uh oh!

jiqing-feng commented Aug 14, 2025

Uh oh!

jiqing-feng commented Aug 14, 2025

Uh oh!

Uh oh!

Qubitium commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix exception to avoid memory issue #1679

fix exception to avoid memory issue #1679

Uh oh!

Conversation

jiqing-feng commented Aug 14, 2025

Uh oh!

jiqing-feng commented Aug 14, 2025

Uh oh!

Uh oh!

Qubitium commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants