Name and Version
bee llama-server v0.3.0
Operating systems
Windows
GGML backends
CUDA
Hardware
Ryzen 7500x + RTX 5060 Ti 16 GB
Models
Qwen3.6-35B-A3B-UD-IQ3_XXS-MTP.gguf
Problem description & steps to reproduce
MTP output is short and contains random garbage only. Upstream llamacpp works ok with the same settings.
First Bad Commit
No response
Relevant log output
Logs
llama-server ^
-m "e:\LMStudio_Models\models\unsloth\Qwen3.6-35B-A3B-MTP-GGUF\Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf" ^
--alias "Qwen3.6" ^
--host 127.0.0.1 --port 8001 ^
--ctx-size 60000 ^
--fit off ^
--n-gpu-layers 999 ^
--temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 ^
--presence-penalty 0.0 --repeat-penalty 1.0 ^
-ctk q8_0 -ctv q8_0 ^
--flash-attn on ^
--batch-size 512 --ubatch-size 256 ^
--threads 8 --threads-batch 8 ^
--no-mmap --mlock ^
--parallel 1 --prio 2 ^
--spec-type draft-mtp --spec-draft-n-max 3 ^
--spec-draft-ngl 999 ^
--log-verbosity 3 ^
--metrics ^
--log-colors off ^
--ctx-checkpoints 0 ^
--cache-ram 0 ^
--jinja
Name and Version
bee llama-server v0.3.0
Operating systems
Windows
GGML backends
CUDA
Hardware
Ryzen 7500x + RTX 5060 Ti 16 GB
Models
Qwen3.6-35B-A3B-UD-IQ3_XXS-MTP.gguf
Problem description & steps to reproduce
MTP output is short and contains random garbage only. Upstream llamacpp works ok with the same settings.
First Bad Commit
No response
Relevant log output
Logs