on 3090 win10 i get full vram fill even tho i use q5 that is 12gb



Flux Q8 12 gb GGUF loads fast but this one Q5 that is also 12GB takes forever  to load  and EACH TIME i generate image.And i get this that runs for like 1 minute and then starts generating -

Falling back to numpy dequant for qtype: 21
Falling back to numpy dequant for qtype: 21
Falling back to numpy dequant for qtype: 21

From what i noticed - Q2 and Q3 GGUF versions of Llama cause fallback but Q6 is just too slow
My speed with Q4 llama and Q2 t5 (the only ones that i got loaded fast ) is this for 1024x1024 18steps
got prompt
Requested to load HiDreamTEModel_
loaded completely 8356.927012252807 7869.5537109375 True
100%|██████████████████████████████████████████████████████████████████████████████████| 18/18 [00:59<00:00,  3.32s/it]
Prompt executed in 75.36 seconds

![Image](https://github.com/user-attachments/assets/4fea389f-8a5c-4eae-a694-6a09419c4c7d)








Provide feedback

Saved searches

Use saved searches to filter your results more quickly

on 3090 win10 i get full vram fill even tho i use q5 that is 12gb #255

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

on 3090 win10 i get full vram fill even tho i use q5 that is 12gb #255

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions