Skip to content

No speed gain on 20gb vram with 27b q4 #22

@cryptopsy0

Description

@cryptopsy0

Using a 7900xt with 20gb vram, having tried for HIP and Vulkan backends, I cannot get noticeable speed increases with the default recommended settings in the guide:

"$BEE_SERVER" --host 0.0.0.0 --port $PORT
-m $MODEL -md $DRAFT
--jinja --chat-template-kwargs '{"enable_thinking":true}'
-ngld all -ngl all -np 1 --reasoning on --cache-ram 0
--spec-type dflash --spec-dflash-cross-ctx 512
--kv-unified -b 2048 -ub 256
--spec-draft-n-max 3
--log-timestamps --log-prefix --log-colors off
--no-mmap --mlock --no-host \
--temp 0.6 --top-k 20 --min-p 0.0
-ctk turbo3 -ctv turbo3
-fa on --metrics -c 64000

MODEL=Qwen3.6-27B-Q4_K_M.gguf
DRAFT=dflash-draft-3.6-q4_k_m.gguf

I have context set at 64k because this is the minimum that hermes requires for usage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions