Description
Feature request
Description:
I would like to request support for the Qwen3-Reranker model (specifically Qwen3-Reranker-0.6B) in the text-embeddings-inference repository.
Currently, there appears to be an issue when trying to convert Qwen3-Reranker from Qwen3ForCausalLM to Qwen3ForSequenceClassification, with the error message indicating that the classifier model type is not supported for Qwen3.
Additional Context:
The Qwen3-Reranker model has been discussed on HuggingFace (reference: https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3), but proper integration with the inference server seems to require additional support.
testing with docker image ghcr.io/huggingface/text-embeddings-inference:turing-1.7.2
error traceback
rerank-qwen3 | 2025-06-17T02:12:36.220459Z INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
rerank-qwen3 | 2025-06-17T02:12:36.639564Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:463: Starting FlashQwen3 model on Cuda(CudaDevice(DeviceId(1)))
rerank-qwen3 | 2025-06-17T02:12:36.640020Z ERROR text_embeddings_backend: backends/src/lib.rs:388: Could not start Candle backend: Could not start backend:classifier
model type is not supported for Qwen3
rerank-qwen3 | Error: Could not create backend
rerank-qwen3 |
rerank-qwen3 | Caused by:
rerank-qwen3 | Could not start backend: Could not start a suitable backend
Requested Features:
Add support for Qwen3-Reranker model architecture
Implement proper handling of the sequence classification variant
Include the model in the supported model types for reranking tasks
Use Case:
This would enable users to deploy Qwen3-Reranker as part of their embedding and retrieval pipelines using the optimized inference server.
Would you be able to provide guidance on what would be needed to implement this support? I'm happy to provide additional details or testing if needed.
Motivation
Qwen3-Reranker is a high-performance reranking model developed by Alibaba Cloud, offering a strong balance between efficiency and accuracy for retrieval-augmented generation (RAG) and semantic search tasks. Currently, text-embeddings-inference (TEI) does not support Qwen3ForSequenceClassification, making it difficult to deploy Qwen3-Reranker in optimized inference pipelines.
Supporting Qwen3-Reranker in TEI would:
Enable seamless integration with existing RAG and search systems.
Provide optimized inference (e.g., FlashAttention, dynamic batching) compared to manual deployment.
Expand TEI's coverage of popular open-weight models, aligning with the growing adoption of the Qwen series (Qwen2, Qwen1.5, etc.).
Given the increasing use of Qwen models in industry and research, adding native support for Qwen3-Reranker would significantly improve user experience and broaden TEI's applicability.
Your contribution
I'm opening this issue to request support for Qwen3-Reranker. While I don't have a concrete implementation yet, I'm happy to:
- Provide testing on different hardware environments
- Share benchmark results
- Collaborate on validating any potential solutions