Flux Q8 12 gb GGUF loads fast but this one Q5 that is also 12GB takes forever to load and EACH TIME i generate image.And i get this that runs for like 1 minute and then starts generating -
Falling back to numpy dequant for qtype: 21
Falling back to numpy dequant for qtype: 21
Falling back to numpy dequant for qtype: 21
From what i noticed - Q2 and Q3 GGUF versions of Llama cause fallback but Q6 is just too slow
My speed with Q4 llama and Q2 t5 (the only ones that i got loaded fast ) is this for 1024x1024 18steps
got prompt
Requested to load HiDreamTEModel_
loaded completely 8356.927012252807 7869.5537109375 True
100%|██████████████████████████████████████████████████████████████████████████████████| 18/18 [00:59<00:00, 3.32s/it]
Prompt executed in 75.36 seconds

Flux Q8 12 gb GGUF loads fast but this one Q5 that is also 12GB takes forever to load and EACH TIME i generate image.And i get this that runs for like 1 minute and then starts generating -
Falling back to numpy dequant for qtype: 21
Falling back to numpy dequant for qtype: 21
Falling back to numpy dequant for qtype: 21
From what i noticed - Q2 and Q3 GGUF versions of Llama cause fallback but Q6 is just too slow
My speed with Q4 llama and Q2 t5 (the only ones that i got loaded fast ) is this for 1024x1024 18steps
got prompt
Requested to load HiDreamTEModel_
loaded completely 8356.927012252807 7869.5537109375 True
100%|██████████████████████████████████████████████████████████████████████████████████| 18/18 [00:59<00:00, 3.32s/it]
Prompt executed in 75.36 seconds