Skip to content

on 3090 win10 i get full vram fill even tho i use q5 that is 12gb #255

@1blackbar

Description

@1blackbar

Flux Q8 12 gb GGUF loads fast but this one Q5 that is also 12GB takes forever to load and EACH TIME i generate image.And i get this that runs for like 1 minute and then starts generating -

Falling back to numpy dequant for qtype: 21
Falling back to numpy dequant for qtype: 21
Falling back to numpy dequant for qtype: 21

From what i noticed - Q2 and Q3 GGUF versions of Llama cause fallback but Q6 is just too slow
My speed with Q4 llama and Q2 t5 (the only ones that i got loaded fast ) is this for 1024x1024 18steps
got prompt
Requested to load HiDreamTEModel_
loaded completely 8356.927012252807 7869.5537109375 True
100%|██████████████████████████████████████████████████████████████████████████████████| 18/18 [00:59<00:00, 3.32s/it]
Prompt executed in 75.36 seconds

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions