-
Notifications
You must be signed in to change notification settings - Fork 123
Description
Disclaimer: This suggestion may be outside the scope of this project and might be better suited as a completely separate, standalone effort. Feel free to close it without further comment.
I’m wondering whether the qwen3-vl-embedding / reranking models could be integrated into flux2.c in some way. qwen3-vl-embedding is a state-of-the-art model (as of early Jan 2026) for multimodal embeddings: the input can be text, images, or video, and it produces an embedding from the input plus an optional instruction. qwen3-vl-reranking is the complementary reranking model: given an input, a query, and an optional instruction, it outputs a relevance score. Repeating this with multiple inputs and the same query yields a set of relevance scores for the collection.
qwen3-vl-embedding was trained with QAT and supports Matryoshka embeddings, so the output vectors can be quantized to lower precision (e.g., int4) and truncated to smaller sizes (e.g., from 2048 to 256) with minimal performance degradation.
Both models are available in 2B and 8B parameter versions.
At the moment, these models are not widely supported by the ML/AI ecosystem. They are not yet supported by llama.cpp or by mlx, although GGUF conversion appears to be possible — I’ve tried using the ggml-org/gguf-my-repo conversion tool.
Perhaps qwen3-vl-embedding could enable additional input modes for flux2.c (e.g., video, image, or text, with an optional instruction to steer the representation).
My main concern is that even though the backbone used to produce the embeddings is still qwen3, the additional training involved in creating qwen3-vl-embedding might have resulted in embeddings that are not compatible with those expected by the flux2 model.
References:
- Qwen3 VL Embedding / Reranking repo: https://github.com/QwenLM/Qwen3-VL-Embedding
- Paper: https://github.com/QwenLM/Qwen3-VL-Embedding/blob/main/assets/qwen3vlembedding_technical_report.pdf
- Hugging Face (Qwen3-VL-Embedding): https://huggingface.co/collections/Qwen/qwen3-vl-embedding
- Hugging Face (Qwen3-VL-Reranker): https://huggingface.co/collections/Qwen/qwen3-vl-reranker