Fix OOM regression in _apply() for quantized models during inference#13372
Fix OOM regression in _apply() for quantized models during inference#13372Kosinkadink merged 2 commits intoComfy-Org:masterfrom
Conversation
Skip unnecessary clone of inference-mode tensors when already inside torch.inference_mode(), matching the existing guard in set_attr_param. The unconditional clone introduced in 20561aa caused transient VRAM doubling during model movement for FP8/quantized models.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe change modifies the parameter cloning behavior in the mixed precision operations module. Previously, parameters were cloned when 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Thank you, that helped me. |
Thanks for reproducing this. Yes similar, using WSL (not bare Linux). I did try --disable-dynamic-vram and I got the same results, that is why tried to find the issue using git bisect. |
|
Since we got a reproduction + good amount of testing, merging. |



I don't really do pytorch but I had OOM after upgrading to 0.18.2, which I eventually traced down using git bisect to commit 20561aa. I used qwen_image_edit_2511_fp8mixed model on the stocks Image Edit(Qwen-Image 2511). If I run the workflow more than once I get OOM. I then asked claude code to help me debug and fix this regression. With this small patch this fixes my OOM issue with my 5090.