Skip to content

Misc. bug: xmfp6 and turbo3 sharing same ID:42 this preventing from loading xmfp6 models #49

@ovadmani-sudo

Description

@ovadmani-sudo

Name and Version

version: 9459 (07ac3ce)
built with Clang 21.1.8 for Linux x86_64

Thank you.
ovadmani

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

#!/bin/bash                                                                                                                            
                                                                                                                                       
# Define the exact absolute path based on your verified folder structure                                                               
BUILD_DIR="/home/ovadm/beellama.cpp/rocm-build"                                                                                        
                                                                                                                                       
# 1. Map both local build libraries and global ROCm targets cleanly                                                                    
export LD_LIBRARY_PATH="$BUILD_DIR/bin:/opt/rocm/lib:/opt/rocm/lib64:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH"                       
                                                                                                                                       
# 2. Hardware Architecture Overrides for Ryzen AI Max+ 395 (Strix Halo)                                                                
export HSA_OVERRIDE_GFX_VERSION=11.5.1                                                                                                 
export HCC_AMDGPU_TARGET=gfx1151                                                                                                       
export PYTORCH_ROCM_ARCH=gfx1151                                                                                                       
export GGML_HIP_NO_PINNED=0                                                                                                            
export AMD_DISABLE_GFXOFF=1                                                                                                            
# 3. Unified Memory and Power Constraints Optimization                                                                                 
export AMD_DISABLE_GFXOFF=1                                                                                                            
export HIP_FORCE_POINTER_MAPPING=1                                                                                                     
                                                                                                                                       
echo "Starting BeeLlama Server with DFlash speculation on gfx1151..."                                                                  
                                                                                                                                       
# 4. Invoke the binary using its absolute location path                                                                                
ARGS=(
   # --verbose
    #-m /home/ovadm/models/bartowski--google_gemma-4-31B-it-GGUF/google_gemma-4-31B-it-IQ4_NL.gguf
   # -m /home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-q6_k.gguf
    -m /home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf
   --alias local_model
   #--mmproj /home/ovadm/models/mmproj-BF16.gguf
   -np 1   #second app
   --kv-unified  # not splitting context
   --spec-type dflash
   #--ctx-size-draft 8
   --spec-draft-model /home/ovadm/models/Anbeeld--gemma-4-31B-it-DFlash-GGUF/gemma4-31b-it-dflash-Q8_0.gguf
   #--spec-dflash-cross-ctx  1024
   #--spec-draft-ngl 99   #draft engine on cpu 
   #--spec-draft-threads 32
   #--spec-draft-threads-batch 32
   -ngl 99 #brain on gpu
   -t 32
   --draft-max 8
   --draft-min 0
   #--no-spec-dm-adaptive
   --spec-dflash-default
   --spec-draft-p-min 0.25
   --flash-attn on
   --cache-type-k q8_0
   --cache-type-v q8_0
   -b 2048
   -ub 2048
   -c  65536
   --port 9091
   --no-mmap
   --mlock
   --metrics
   --perf 
   --presence-penalty 0
   --temp 0.3
   --top-p 1.0
   --top-k 20
   --min-p 0
   -to 1800
   #--jinja 
   #--reasoning-format deepseek 
   #--chat-template-kwargs '{"enable_thinking":true}'
   --webui-mcp-proxy
)
sudo pkill -9 llama-server
sleep 1
"$BUILD_DIR/bin/llama-server" "${ARGS[@]}" 2>&1 | tee >(stdbuf -oL socat - UDP4-SENDTO:127.0.0.1:5001)  | grep --line-buffered -iE "error|failed|exception|critical|fatal"

**********************************************************************************

 ./gemma-31b-dflsh.sh 
Starting BeeLlama Server with DFlash speculation on gfx1151...
gguf_init_from_file: failed to open GGUF file '/home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf' (No such file or directory)
llama_model_load: error loading model: llama_model_loader: failed to load model from /home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf
llama_model_load_from_file_impl: failed to load model
common_fit_params: encountered an error while trying to fit params to free device memory: failed to load model
gguf_init_from_file: failed to open GGUF file '/home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf' (No such file or directory)
llama_model_load: error loading model: llama_model_loader: failed to load model from /home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf'
srv    load_model: failed to load model, '/home/ovadm/models/google--gemma-4-31B-it/gemma-4-31b-mxfp6.gguf'

Problem description & steps to reproduce

load any xmfp6 model (xfmp4 can only do moe)

First Bad Commit

No response

Relevant log output

Logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions