Skip to content

Conversation

@jiqing-feng
Copy link
Collaborator

The exception e is not just an ImportError instance — it also stores a reference to a traceback object.
The traceback contains the entire call stack at the time of the error.
Keeping the exception as a global value will lead to the object not being released, for example:

import sys
import torch
import gc
from diffusers import (
    FluxControlPipeline,
    FluxTransformer2DModel,
    GGUFQuantizationConfig,
)


torch_device = 0
accelerator_module = None
if torch.cuda.is_available():
    accelerator_module = torch.cuda
elif torch.xpu.is_available():
    accelerator_module = torch.xpu


def test_lora_loading():
    ckpt_path = "https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q2_K.gguf"
    transformer = FluxTransformer2DModel.from_single_file(
        ckpt_path,
        quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
        torch_dtype=torch.bfloat16,
    ).to(torch_device)
    pipe = FluxControlPipeline.from_pretrained(
        "black-forest-labs/FLUX.1-dev",
        transformer=transformer,
        torch_dtype=torch.bfloat16,
    ).to(torch_device)
    print(f"transformer ref before lora loading: {sys.getrefcount(transformer)}")
    pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora")
    print(f"transformer ref after lora loading: {sys.getrefcount(transformer)}")

    del transformer
    del pipe
    gc.collect()
    accelerator_module.empty_cache()

print(f"memory before running: {accelerator_module.max_memory_allocated()}")
test_lora_loading()
print(f"memory after running: {accelerator_module.max_memory_allocated()}")
accelerator_module.reset_peak_memory_stats()
print(f"memory after reset_peak_memory_stats: {accelerator_module.max_memory_allocated()}")

Output before this PR:

memory before running: 0
transformer ref before lora loading: 3
transformer ref after lora loading: 12
memory after running: 15408069632
memory after reset_peak_memory_stats: 15408069632

We can keep the string of the exception instead of the exception object. In this case, we only store a string instead of the full traceback as the global value.

Output after this PR:

memory before running: 0
transformer ref before lora loading: 3
transformer ref after lora loading: 3
memory after running: 15408069632
memory after reset_peak_memory_stats: 0

@jiqing-feng jiqing-feng requested a review from Qubitium August 14, 2025 02:09
@jiqing-feng
Copy link
Collaborator Author

Hi @Qubitium . Would you please review this PR? Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@Qubitium Qubitium merged commit 37a0d9f into ModelCloud:main Aug 15, 2025
1 check passed
@Qubitium
Copy link
Collaborator

@jiqing-feng Thank you for the memory leak fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants