Open
Description
System Info
Image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.3
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Model: Qwen/Qwen3-Embedding-0.6B
Error output:
embedding-model-1 | 2025-07-01T02:13:27.612853Z INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "Qwe*/*****-*********-0.6B", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "036ccc928ee3", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
embedding-model-1 | 2025-07-01T02:13:27.735068Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
embedding-model-1 | 2025-07-01T02:13:27.735088Z INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
embedding-model-1 | 2025-07-01T02:13:29.909101Z INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
embedding-model-1 | 2025-07-01T02:13:29.909177Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
embedding-model-1 | 2025-07-01T02:13:29.909202Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
embedding-model-1 | 2025-07-01T02:13:29.909231Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 2.174166763s
embedding-model-1 | 2025-07-01T02:13:30.318102Z WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
embedding-model-1 | 2025-07-01T02:13:30.318120Z INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 32768
embedding-model-1 | 2025-07-01T02:13:30.318334Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 8 tokenization workers
embedding-model-1 | 2025-07-01T02:13:30.690802Z INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
embedding-model-1 | 2025-07-01T02:13:30.692087Z INFO text_embeddings_backend: backends/src/lib.rs:551: Downloading `model.onnx`
embedding-model-1 | 2025-07-01T02:13:30.937237Z WARN text_embeddings_backend: backends/src/lib.rs:555: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/resolve/main/model.onnx)
embedding-model-1 | 2025-07-01T02:13:30.937266Z INFO text_embeddings_backend: backends/src/lib.rs:556: Downloading `onnx/model.onnx`
embedding-model-1 | 2025-07-01T02:13:31.183201Z WARN text_embeddings_backend: backends/src/lib.rs:560: Could not download `onnx/model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/resolve/main/onnx/model.onnx)
embedding-model-1 | 2025-07-01T02:13:31.183228Z INFO text_embeddings_backend: backends/src/lib.rs:565: Downloading `model.onnx_data`
embedding-model-1 | 2025-07-01T02:13:31.423084Z WARN text_embeddings_backend: backends/src/lib.rs:569: Could not download `model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/resolve/main/model.onnx_data)
embedding-model-1 | 2025-07-01T02:13:31.423102Z INFO text_embeddings_backend: backends/src/lib.rs:570: Downloading `onnx/model.onnx_data`
embedding-model-1 | 2025-07-01T02:13:31.665729Z WARN text_embeddings_backend: backends/src/lib.rs:574: Could not download `onnx/model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/resolve/main/onnx/model.onnx_data)
embedding-model-1 | 2025-07-01T02:13:31.665751Z ERROR text_embeddings_backend: backends/src/lib.rs:363: Model ONNX files not found in the repository. You can easily create ONNX files using the following scripts: https://gist.github.com/tomaarsen/4b00b0e3be8884efa64cfab9230b161f, or use this Space: https://huggingface.co/spaces/sentence-transformers/backend-export
embedding-model-1 | 2025-07-01T02:13:31.665805Z ERROR text_embeddings_backend: backends/src/lib.rs:375: Could not start ORT backend: Could not start backend: File at `/data/models--Qwen--Qwen3-Embedding-0.6B/snapshots/c54f2e6e80b2d7b7de06f51cec4959f6b3e03418/onnx/model.onnx` does not exist
embedding-model-1 | 2025-07-01T02:13:31.665818Z INFO text_embeddings_backend: backends/src/lib.rs:510: Downloading `model.safetensors`
embedding-model-1 | 2025-07-01T02:13:31.665866Z INFO text_embeddings_backend: backends/src/lib.rs:394: Model weights downloaded in 47.74µs
embedding-model-1 | 2025-07-01T02:13:31.666795Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:279: Starting Qwen3 model on Cpu
embedding-model-1 |
embedding-model-1 | Intel MKL ERROR: Parameter 8 was incorrect on entry to SGEMM .
embedding-model-1 |
embedding-model-1 | Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM .
embedding-model-1 |
embedding-model-1 | Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM .
embedding-model-1 |
embedding-model-1 | Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM .
embedding-model-1 exited with code 0
Also tried onnx model: onnx-community/Qwen3-Embedding-0.6B-ONNX
Error output:
embedding-model-1 | 2025-07-01T02:16:12.595977Z INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "onn*-*********/*****-*********-*.**-*NNX", revision: None, tokenization_workers: None, dtype: None, pooling: Some(Mean), max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "05c0aa0b7629", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
embedding-model-1 | 2025-07-01T02:16:12.717231Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
embedding-model-1 | 2025-07-01T02:16:14.422784Z INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
embedding-model-1 | 2025-07-01T02:16:14.665394Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/onnx-community/Qwen3-Embedding-0.6B-ONNX/resolve/main/config_sentence_transformers.json)
embedding-model-1 | 2025-07-01T02:16:14.665413Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
embedding-model-1 | 2025-07-01T02:16:14.665467Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
embedding-model-1 | 2025-07-01T02:16:14.665498Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.94827036s
embedding-model-1 | 2025-07-01T02:16:15.106335Z WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
embedding-model-1 | 2025-07-01T02:16:15.106355Z INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 32768
embedding-model-1 | 2025-07-01T02:16:15.106546Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 8 tokenization workers
embedding-model-1 | 2025-07-01T02:16:15.479361Z INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
embedding-model-1 | 2025-07-01T02:16:15.480697Z INFO text_embeddings_backend: backends/src/lib.rs:551: Downloading `model.onnx`
embedding-model-1 | 2025-07-01T02:16:15.735010Z WARN text_embeddings_backend: backends/src/lib.rs:555: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/onnx-community/Qwen3-Embedding-0.6B-ONNX/resolve/main/model.onnx)
embedding-model-1 | 2025-07-01T02:16:15.735024Z INFO text_embeddings_backend: backends/src/lib.rs:556: Downloading `onnx/model.onnx`
embedding-model-1 | 2025-07-01T02:16:15.735087Z INFO text_embeddings_backend: backends/src/lib.rs:565: Downloading `model.onnx_data`
embedding-model-1 | 2025-07-01T02:16:15.985354Z WARN text_embeddings_backend: backends/src/lib.rs:569: Could not download `model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/onnx-community/Qwen3-Embedding-0.6B-ONNX/resolve/main/model.onnx_data)
embedding-model-1 | 2025-07-01T02:16:15.985375Z INFO text_embeddings_backend: backends/src/lib.rs:570: Downloading `onnx/model.onnx_data`
embedding-model-1 | 2025-07-01T02:16:15.985453Z INFO text_embeddings_backend: backends/src/lib.rs:366: Model ONNX weights downloaded in 504.756511ms
embedding-model-1 | 2025-07-01T02:16:22.818478Z WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/ort-2.0.0-rc.9/src/environment.rs:419: Non-zero status code returned while running Concat node. Name:'/model/layers.0/attn/v_proj/repeat_kv/Concat_1' Status Message: /home/runner/work/ort-artifacts-staging/ort-artifacts-staging/onnxruntime/include/onnxruntime/core/framework/op_kernel_context.h:42 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: past_key_values.0.value
embedding-model-1 |
embedding-model-1 | Error: Model backend is not healthy
embedding-model-1 |
embedding-model-1 | Caused by:
embedding-model-1 | Non-zero status code returned while running Concat node. Name:'/model/layers.0/attn/v_proj/repeat_kv/Concat_1' Status Message: /home/runner/work/ort-artifacts-staging/ort-artifacts-staging/onnxruntime/include/onnxruntime/core/framework/op_kernel_context.h:42 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: past_key_values.0.value
embedding-model-1 |
embedding-model-1 exited with code 0
Launch parameters: --pooling mean
Expected behavior
I saw 1.7.3 release has some fixes for the qwen3 on CPU, but it doesn't work for me for some reason. I haven't tested the main branch yet
Metadata
Metadata
Assignees
Labels
No labels