Skip to content

Can't start Qwen3 0.6B on CPU #667

Open
@teamclouday

Description

@teamclouday

System Info

Image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.3

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Model: Qwen/Qwen3-Embedding-0.6B
Error output:

embedding-model-1  | 2025-07-01T02:13:27.612853Z  INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "Qwe*/*****-*********-0.6B", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "036ccc928ee3", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
embedding-model-1  | 2025-07-01T02:13:27.735068Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
embedding-model-1  | 2025-07-01T02:13:27.735088Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
embedding-model-1  | 2025-07-01T02:13:29.909101Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
embedding-model-1  | 2025-07-01T02:13:29.909177Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
embedding-model-1  | 2025-07-01T02:13:29.909202Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
embedding-model-1  | 2025-07-01T02:13:29.909231Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 2.174166763s
embedding-model-1  | 2025-07-01T02:13:30.318102Z  WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
embedding-model-1  | 2025-07-01T02:13:30.318120Z  INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 32768
embedding-model-1  | 2025-07-01T02:13:30.318334Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 8 tokenization workers
embedding-model-1  | 2025-07-01T02:13:30.690802Z  INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
embedding-model-1  | 2025-07-01T02:13:30.692087Z  INFO text_embeddings_backend: backends/src/lib.rs:551: Downloading `model.onnx`
embedding-model-1  | 2025-07-01T02:13:30.937237Z  WARN text_embeddings_backend: backends/src/lib.rs:555: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/resolve/main/model.onnx)
embedding-model-1  | 2025-07-01T02:13:30.937266Z  INFO text_embeddings_backend: backends/src/lib.rs:556: Downloading `onnx/model.onnx`
embedding-model-1  | 2025-07-01T02:13:31.183201Z  WARN text_embeddings_backend: backends/src/lib.rs:560: Could not download `onnx/model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/resolve/main/onnx/model.onnx)
embedding-model-1  | 2025-07-01T02:13:31.183228Z  INFO text_embeddings_backend: backends/src/lib.rs:565: Downloading `model.onnx_data`
embedding-model-1  | 2025-07-01T02:13:31.423084Z  WARN text_embeddings_backend: backends/src/lib.rs:569: Could not download `model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/resolve/main/model.onnx_data)
embedding-model-1  | 2025-07-01T02:13:31.423102Z  INFO text_embeddings_backend: backends/src/lib.rs:570: Downloading `onnx/model.onnx_data`
embedding-model-1  | 2025-07-01T02:13:31.665729Z  WARN text_embeddings_backend: backends/src/lib.rs:574: Could not download `onnx/model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/resolve/main/onnx/model.onnx_data)
embedding-model-1  | 2025-07-01T02:13:31.665751Z ERROR text_embeddings_backend: backends/src/lib.rs:363: Model ONNX files not found in the repository. You can easily create ONNX files using the following scripts: https://gist.github.com/tomaarsen/4b00b0e3be8884efa64cfab9230b161f, or use this Space: https://huggingface.co/spaces/sentence-transformers/backend-export
embedding-model-1  | 2025-07-01T02:13:31.665805Z ERROR text_embeddings_backend: backends/src/lib.rs:375: Could not start ORT backend: Could not start backend: File at `/data/models--Qwen--Qwen3-Embedding-0.6B/snapshots/c54f2e6e80b2d7b7de06f51cec4959f6b3e03418/onnx/model.onnx` does not exist
embedding-model-1  | 2025-07-01T02:13:31.665818Z  INFO text_embeddings_backend: backends/src/lib.rs:510: Downloading `model.safetensors`
embedding-model-1  | 2025-07-01T02:13:31.665866Z  INFO text_embeddings_backend: backends/src/lib.rs:394: Model weights downloaded in 47.74µs
embedding-model-1  | 2025-07-01T02:13:31.666795Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:279: Starting Qwen3 model on Cpu
embedding-model-1  | 
embedding-model-1  | Intel MKL ERROR: Parameter 8 was incorrect on entry to SGEMM .
embedding-model-1  | 
embedding-model-1  | Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM .
embedding-model-1  | 
embedding-model-1  | Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM .
embedding-model-1  | 
embedding-model-1  | Intel MKL ERROR: Parameter 13 was incorrect on entry to SGEMM .
embedding-model-1 exited with code 0

Also tried onnx model: onnx-community/Qwen3-Embedding-0.6B-ONNX
Error output:

embedding-model-1  | 2025-07-01T02:16:12.595977Z  INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "onn*-*********/*****-*********-*.**-*NNX", revision: None, tokenization_workers: None, dtype: None, pooling: Some(Mean), max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "05c0aa0b7629", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
embedding-model-1  | 2025-07-01T02:16:12.717231Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
embedding-model-1  | 2025-07-01T02:16:14.422784Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
embedding-model-1  | 2025-07-01T02:16:14.665394Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/onnx-community/Qwen3-Embedding-0.6B-ONNX/resolve/main/config_sentence_transformers.json)
embedding-model-1  | 2025-07-01T02:16:14.665413Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
embedding-model-1  | 2025-07-01T02:16:14.665467Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
embedding-model-1  | 2025-07-01T02:16:14.665498Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.94827036s
embedding-model-1  | 2025-07-01T02:16:15.106335Z  WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
embedding-model-1  | 2025-07-01T02:16:15.106355Z  INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 32768
embedding-model-1  | 2025-07-01T02:16:15.106546Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 8 tokenization workers
embedding-model-1  | 2025-07-01T02:16:15.479361Z  INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
embedding-model-1  | 2025-07-01T02:16:15.480697Z  INFO text_embeddings_backend: backends/src/lib.rs:551: Downloading `model.onnx`
embedding-model-1  | 2025-07-01T02:16:15.735010Z  WARN text_embeddings_backend: backends/src/lib.rs:555: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/onnx-community/Qwen3-Embedding-0.6B-ONNX/resolve/main/model.onnx)
embedding-model-1  | 2025-07-01T02:16:15.735024Z  INFO text_embeddings_backend: backends/src/lib.rs:556: Downloading `onnx/model.onnx`
embedding-model-1  | 2025-07-01T02:16:15.735087Z  INFO text_embeddings_backend: backends/src/lib.rs:565: Downloading `model.onnx_data`
embedding-model-1  | 2025-07-01T02:16:15.985354Z  WARN text_embeddings_backend: backends/src/lib.rs:569: Could not download `model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/onnx-community/Qwen3-Embedding-0.6B-ONNX/resolve/main/model.onnx_data)
embedding-model-1  | 2025-07-01T02:16:15.985375Z  INFO text_embeddings_backend: backends/src/lib.rs:570: Downloading `onnx/model.onnx_data`
embedding-model-1  | 2025-07-01T02:16:15.985453Z  INFO text_embeddings_backend: backends/src/lib.rs:366: Model ONNX weights downloaded in 504.756511ms
embedding-model-1  | 2025-07-01T02:16:22.818478Z  WARN ort::environment: /usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/ort-2.0.0-rc.9/src/environment.rs:419: Non-zero status code returned while running Concat node. Name:'/model/layers.0/attn/v_proj/repeat_kv/Concat_1' Status Message: /home/runner/work/ort-artifacts-staging/ort-artifacts-staging/onnxruntime/include/onnxruntime/core/framework/op_kernel_context.h:42 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: past_key_values.0.value
embedding-model-1  | 
embedding-model-1  | Error: Model backend is not healthy
embedding-model-1  | 
embedding-model-1  | Caused by:
embedding-model-1  |     Non-zero status code returned while running Concat node. Name:'/model/layers.0/attn/v_proj/repeat_kv/Concat_1' Status Message: /home/runner/work/ort-artifacts-staging/ort-artifacts-staging/onnxruntime/include/onnxruntime/core/framework/op_kernel_context.h:42 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: past_key_values.0.value
embedding-model-1  |     
embedding-model-1 exited with code 0

Launch parameters: --pooling mean

Expected behavior

I saw 1.7.3 release has some fixes for the qwen3 on CPU, but it doesn't work for me for some reason. I haven't tested the main branch yet

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions