Feature Request: Add Support for Qwen3-Reranker Model

### Feature request

# Description:
I would like to request support for the Qwen3-Reranker model (specifically Qwen3-Reranker-0.6B) in the text-embeddings-inference repository.

Currently, there appears to be an issue when trying to convert Qwen3-Reranker from Qwen3ForCausalLM to Qwen3ForSequenceClassification, with the error message indicating that the classifier model type is not supported for Qwen3.

# Additional Context:
The Qwen3-Reranker model has been discussed on HuggingFace (reference: https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3), but proper integration with the inference server seems to require additional support.
## testing with docker image `ghcr.io/huggingface/text-embeddings-inference:turing-1.7.2`
error traceback
> rerank-qwen3  | 2025-06-17T02:12:36.220459Z  INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
rerank-qwen3  | 2025-06-17T02:12:36.639564Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:463: Starting FlashQwen3 model on Cuda(CudaDevice(DeviceId(1)))
rerank-qwen3  | 2025-06-17T02:12:36.640020Z ERROR text_embeddings_backend: backends/src/lib.rs:388: Could not start Candle backend: Could not start backend: `classifier` model type is not supported for Qwen3
rerank-qwen3  | Error: Could not create backend
rerank-qwen3  | 
rerank-qwen3  | Caused by:
rerank-qwen3  |     Could not start backend: Could not start a suitable backend

# Requested Features:

Add support for Qwen3-Reranker model architecture

Implement proper handling of the sequence classification variant

Include the model in the supported model types for reranking tasks

# Use Case:
This would enable users to deploy Qwen3-Reranker as part of their embedding and retrieval pipelines using the optimized inference server.

Would you be able to provide guidance on what would be needed to implement this support? I'm happy to provide additional details or testing if needed.

### Motivation

Qwen3-Reranker is a high-performance reranking model developed by Alibaba Cloud, offering a strong balance between efficiency and accuracy for retrieval-augmented generation (RAG) and semantic search tasks. Currently, text-embeddings-inference (TEI) does not support Qwen3ForSequenceClassification, making it difficult to deploy Qwen3-Reranker in optimized inference pipelines.

Supporting Qwen3-Reranker in TEI would:

Enable seamless integration with existing RAG and search systems.

Provide optimized inference (e.g., FlashAttention, dynamic batching) compared to manual deployment.

Expand TEI's coverage of popular open-weight models, aligning with the growing adoption of the Qwen series (Qwen2, Qwen1.5, etc.).

Given the increasing use of Qwen models in industry and research, adding native support for Qwen3-Reranker would significantly improve user experience and broaden TEI's applicability.

### Your contribution

I'm opening this issue to request support for Qwen3-Reranker. While I don't have a concrete implementation yet, I'm happy to:  
- Provide testing on different hardware environments  
- Share benchmark results  
- Collaborate on validating any potential solutions 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add Support for Qwen3-Reranker Model #643

Feature request

Description:

Additional Context:

testing with docker image `ghcr.io/huggingface/text-embeddings-inference:turing-1.7.2`

Requested Features:

Use Case:

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add Support for Qwen3-Reranker Model #643

Description

Feature request

Description:

Additional Context:

testing with docker image ghcr.io/huggingface/text-embeddings-inference:turing-1.7.2

Requested Features:

Use Case:

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

testing with docker image `ghcr.io/huggingface/text-embeddings-inference:turing-1.7.2`