Skip to content

Eval bug: MTP and NTP output is random garbage #36

@ethernidee

Description

@ethernidee

Name and Version

bee llama-server v0.3.0

Operating systems

Windows

GGML backends

CUDA

Hardware

Ryzen 7500x + RTX 5060 Ti 16 GB

Models

Qwen3.6-35B-A3B-UD-IQ3_XXS-MTP.gguf

Problem description & steps to reproduce

MTP output is short and contains random garbage only. Upstream llamacpp works ok with the same settings.

First Bad Commit

No response

Relevant log output

Logs
llama-server ^
  -m "e:\LMStudio_Models\models\unsloth\Qwen3.6-35B-A3B-MTP-GGUF\Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf" ^
  --alias "Qwen3.6" ^
  --host 127.0.0.1 --port 8001 ^
  --ctx-size 60000 ^
  --fit off ^
  --n-gpu-layers 999 ^
  --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 ^
  --presence-penalty 0.0 --repeat-penalty 1.0 ^
  -ctk q8_0 -ctv q8_0 ^
  --flash-attn on ^
  --batch-size 512 --ubatch-size 256 ^
  --threads 8 --threads-batch 8 ^
  --no-mmap --mlock ^
  --parallel 1 --prio 2 ^
  --spec-type draft-mtp --spec-draft-n-max 3 ^
  --spec-draft-ngl 999 ^
  --log-verbosity 3 ^
  --metrics ^
  --log-colors off ^
  --ctx-checkpoints 0 ^
  --cache-ram 0 ^
  --jinja

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions