Coordinates not accurate with Qwen3-VL using llama.cpp python api

<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/8c9b96f0-482b-4b5c-a61c-73ee1ef7635d" />

bounding box and co-ordinates are not accurately preserved with any the gguf's for Qwen3vl i,.e. 8b and 4b
i also tried mmproj in f16, bf16 and f32 and this made no difference to accuracy
this doesnt happen with even 4bit bits and bytes quantised models in huggingface trasnsformers

I checked the clip.cpp in llamacpp thoroughly and things seem relatvely properly implemented in terms of vision patch and MRoPE.
The visual understanding is still strong so the problem seems to be coorinate specific rather than patch encoding per se
4b models just dont output any codinates at all in GGUF but again the hugging face the same model even woth 4 bit bnb quantisation works perfectly.

can you see why the coordinate system is broken?
I trieng higher quatnts for the LLM model like Q6 to see if perhaps this will fix it...its a little better but still wrong...
I wonder if its a layer precision issue thats very sensitive?

below are the correct coordinates from the Qwen3-4B model on hugging face using bnb 4 bit quantisation

<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/14a3bf38-ff84-48c2-8c4c-c5153fdb8918" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Coordinates not accurate with Qwen3-VL using llama.cpp python api #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Coordinates not accurate with Qwen3-VL using llama.cpp python api #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions