-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GGUF CLIP/Text Encoders don't work on Intel Arc. #50
Comments
I've seen reports of this happening with that "ViT-L-14-BEST-smooth-GmP-ft" checkpoint. Could you test the default clip-l.safetensors one to see if it still happens? There should also be a CLI flag to disable ipex optimizations ( |
Using the regular clip-l didn't change anything, I still run into the problem with the missing GGMLTensor object attribute. As for disabiing optimizations, I wrote the code for that flag and the default optimization call for |
Got it, thanks for testing, will investigate and try to fix. |
I have the exact same error running on Apple Silicon M1 Max 64GB when using the |
@city96 I have done some deep diving and I think I have everything working now. I had to add the following patch which is an implementation of deepcopy to GGUFTensor. diff --git a/ops.py b/ops.py
index 6bc1e2a..7b13806 100644
--- a/ops.py
+++ b/ops.py
@@ -39,6 +39,9 @@ class GGMLTensor(torch.Tensor):
except Exception as e:
print(f"ignoring 'copy_' on tensor")
+ def __deepcopy__(self, memo):^M
+ return self.detach().clone()^M
+^M
@property
def shape(self):
if not hasattr(self, "tensor_shape"): This and the changes I have in comfyanonymous/ComfyUI#4562 should allow for XPU to now work correctly with the GGUF Text Encoders/CLIP loaders you wrote. I believe you should be able to make this change independently of that pull request. Let me know if you have any concerns. Edit: Visual snippet of a modified ComfyUI Examples workflow I tested to make sure this worked. |
@simonlui |
@city96 I tested it out and it does work too. |
I don't have intel so I can't test, but what about any of these: def __deepcopy__(self, *args, **kwargs):
return super().__deepcopy__(*args, **kwargs) def __deepcopy__(self, *args, **kwargs):
return self.data.__deepcopy__(*args, **kwargs) def __deepcopy__(self, *args, **kwargs):
new = super().__deepcopy__(*args, **kwargs)
new.tensor_type = getattr(self, "tensor_type", None)
new.tensor_shape = getattr(self, "tensor_shape", new.data.shape)
new.patches = getattr(self, "patches", []).copy()
return new |
All of these implementations work but I am guessing you want to use the last one since it aligns to changes you already made which is fine by me. |
@simonlui Thanks for the research/testing. Added - hopefully it's fine. |
Yep, did a final test, everything checks out. Thanks for making the change. |
Hey everyone / @city96, I'm the author of I am assuming the issue may have been due to 1. Fine-tuning based on original OpenAI/CLIP code and 2. Then just converting the .pt model to .safetensors, without converting to the syntax HF uses for the model, and 3. without "detaching" the vision transformer (it's a full text-vision transformer model .safetensors). For my previous model, I supplied a text encoder only HF format version, but it didn't seem like there was a particular interest in that - so I omitted the potential "choice confusion" for my latest model. It seems there's demand for that after all, judging by this thread. Alas: If you could either
then I would be happy to upload a 'proper' HF model / proper text encoder for this model, and in the future. Kind regards! |
@zer0int I think comfy handles the logic for loading/key conversions internally (it uses a custom implementation for CLIP instead of importing from transformers/open-clip) and since we're not quantizing/touching clip-l it does work correctly on the latest commit afaik - the issue was that metadata was stripped from our custom quantized tensors during the conversion (even for FP16) which was causing issues even with non-quantized models, but I've added a fallback to just treat them like normal tensors in that case. The intel issue was related to something else. |
So I hate to rain on the newly committed code, but I have errors with the current implementation loading GGUF CLIP/text encoders with Intel Arc using ComfyUI commit 5a69f84. If I try to load it with GPU only using
--gpu-only
, I get the following backtrace.With
--lowvram
, I get this backtrace instead.Understandably, this might not be a high priority. But just putting it out there for people who may run into the same issue I did.
The text was updated successfully, but these errors were encountered: