wan2.1 transformer gguf load error

### Describe the bug

I am testing the performance of the Wan2.1 image-to-video generation on an RTX 4090 using Diffusers' Wan2.1 model【https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers】 and the City96 quantized GGUF model【https://huggingface.co/city96/Wan2.1-I2V-14B-480P-gguf/tree/main】. I referred to this documentation: https://huggingface.co/docs/diffusers/quantization/gguf. However, I encountered an error while trying to load the GGUF model—it seems to be downloading additional data that I don't require. If you could provide any suggestions to resolve this, I would greatly appreciate it.

![Image](https://github.com/user-attachments/assets/76de84b9-6289-490b-a838-1625aebbc40c)

### Reproduction

```  python
import os
import torch
import numpy as np
from diffusers import (
    AutoencoderKLWan, 
    WanImageToVideoPipeline, 
    WanTransformer3DModel, 
    UniPCMultistepScheduler, 
    GGUFQuantizationConfig, 
    export_to_video, 
    load_image
)
from transformers import CLIPVisionModel

# 模型路径
model_id = "/share/haobang.geng/cache/Wan2.1-I2V-14B-480P-Diffusers"

# 加载图像编码器
image_encoder = CLIPVisionModel.from_pretrained(
    model_id, subfolder="image_encoder", torch_dtype=torch.float32
)

# 加载VAE模块
vae = AutoencoderKLWan.from_pretrained(
    model_id, subfolder="vae", torch_dtype=torch.float32
)

# 加载Transformer模型并启用量化
ckpt_path = "/share/haobang.geng/cache/Wan2.1-I2V-14B-480P-gguf/wan2.1-i2v-14b-480p-Q4_1.gguf"
transformer = WanTransformer3DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
)

# 配置调度器
scheduler = UniPCMultistepScheduler(
    prediction_type='flow_prediction',
    use_flow_sigmas=True,
    num_train_timesteps=1000,
    flow_shift=3.0  # 对应480P的flow_shift参数
)

# 创建管道
pipe = WanImageToVideoPipeline.from_pretrained(
    model_id, 
    transformer=transformer, 
    vae=vae, 
    image_encoder=image_encoder, 
    torch_dtype=torch.bfloat16, 
    scheduler=scheduler
)

pipe.enable_model_cpu_offload()

# 加载输入图像
image = load_image("/share/haobang.geng/dataset/character-imgs/1.jpeg")
max_area = 480 * 832
aspect_ratio = image.height / image.width
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
image = image.resize((width, height))

# 输入prompt
prompt = (
    "Anime thick painted cartoon illustration, a blonde girl gracefully lifted her wine glass and gently sipped a mouthful of red wine. "
    "She has delicate facial features, purple eyes shimmering with wisdom, and cheeks slightly red, appearing particularly charming. "
    "She was dressed in luxurious traditional attire, adorned with a golden headpiece, and the background was a blurry indoor scene with faintly visible wooden structures. "
    "Soft light and shadow effects create a classical and romantic atmosphere. Close up half body close-up perspective."
)
negative_prompt = (
    "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, "
    "JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, "
    "still picture, messy background, three legs, many people in the background, walking backwards"
)

# 执行图生视频生成
output = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=81,
    guidance_scale=5.0,
    num_inference_steps=40,
).frames[0]

os.makedirs("/share/haobang.geng/code/wanx-baseline/results/check-wanmulti-framework/gguf", exist_ok=True)
# 导出视频
export_to_video(
    output, 
    "/share/haobang.geng/code/wanx-baseline/results/check-wanmulti-framework/gguf/wanx-diffusers-Q4.mp4", 
    fps=15
)
```

### Logs

```shell

```

### System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Linux-5.15.0-71-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.0
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.29.3
- Transformers version: 4.46.2
- Accelerate version: 1.5.2
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.5.3
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
NVIDIA GeForce RTX 4090, 24564 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?

@DN6 @a-r-r-o-w 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wan2.1 transformer gguf load error #11088

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

wan2.1 transformer gguf load error #11088

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions