-
Notifications
You must be signed in to change notification settings - Fork 13
Description
bounding box and co-ordinates are not accurately preserved with any the gguf's for Qwen3vl i,.e. 8b and 4b
i also tried mmproj in f16, bf16 and f32 and this made no difference to accuracy
this doesnt happen with even 4bit bits and bytes quantised models in huggingface trasnsformers
I checked the clip.cpp in llamacpp thoroughly and things seem relatvely properly implemented in terms of vision patch and MRoPE.
The visual understanding is still strong so the problem seems to be coorinate specific rather than patch encoding per se
4b models just dont output any codinates at all in GGUF but again the hugging face the same model even woth 4 bit bnb quantisation works perfectly.
can you see why the coordinate system is broken?
I trieng higher quatnts for the LLM model like Q6 to see if perhaps this will fix it...its a little better but still wrong...
I wonder if its a layer precision issue thats very sensitive?
below are the correct coordinates from the Qwen3-4B model on hugging face using bnb 4 bit quantisation
