Skip to content

Coordinates not accurate with Qwen3-VL using llama.cpp python api #20

@sujitvasanth

Description

@sujitvasanth
Image

bounding box and co-ordinates are not accurately preserved with any the gguf's for Qwen3vl i,.e. 8b and 4b
i also tried mmproj in f16, bf16 and f32 and this made no difference to accuracy
this doesnt happen with even 4bit bits and bytes quantised models in huggingface trasnsformers

I checked the clip.cpp in llamacpp thoroughly and things seem relatvely properly implemented in terms of vision patch and MRoPE.
The visual understanding is still strong so the problem seems to be coorinate specific rather than patch encoding per se
4b models just dont output any codinates at all in GGUF but again the hugging face the same model even woth 4 bit bnb quantisation works perfectly.

can you see why the coordinate system is broken?
I trieng higher quatnts for the LLM model like Q6 to see if perhaps this will fix it...its a little better but still wrong...
I wonder if its a layer precision issue thats very sensitive?

below are the correct coordinates from the Qwen3-4B model on hugging face using bnb 4 bit quantisation

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions