Skip to content

[Bug]: arm64 No module named 'xformers' #13585

@jiayi-1994

Description

@jiayi-1994

Your current environment

The arm image I built from the source code appeared No module named 'xformers'

INFO 02-19 19:40:50 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3.dev203+ge2603fef.d20250219) with config: model='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
INFO 02-19 19:40:53 cuda.py:178] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 02-19 19:40:53 cuda.py:226] Using XFormers backend.
ERROR 02-19 19:40:53 engine.py:389] No module named 'xformers'
ERROR 02-19 19:40:53 engine.py:389] Traceback (most recent call last):
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine
ERROR 02-19 19:40:53 engine.py:389] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args
ERROR 02-19 19:40:53 engine.py:389] return cls(ipc_path=ipc_path,
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init
ERROR 02-19 19:40:53 engine.py:389] self.engine = LLMEngine(*args, **kwargs)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init
ERROR 02-19 19:40:53 engine.py:389] self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in init
ERROR 02-19 19:40:53 engine.py:389] self._init_executor()
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 45, in _init_executor
ERROR 02-19 19:40:53 engine.py:389] self.collective_rpc("init_worker", args=([kwargs], ))
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 02-19 19:40:53 engine.py:389] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2196, in run_method
ERROR 02-19 19:40:53 engine.py:389] return func(*args, **kwargs)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 565, in init_worker
ERROR 02-19 19:40:53 engine.py:389] self.worker = worker_class(**kwargs)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 82, in init
ERROR 02-19 19:40:53 engine.py:389] self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1061, in init
ERROR 02-19 19:40:53 engine.py:389] self.attn_backend = get_attn_backend(
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/selector.py", line 95, in get_attn_backend
ERROR 02-19 19:40:53 engine.py:389] return _cached_get_attn_backend(
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/selector.py", line 154, in _cached_get_attn_backend
ERROR 02-19 19:40:53 engine.py:389] return resolve_obj_by_qualname(attention_cls)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 1877, in resolve_obj_by_qualname
ERROR 02-19 19:40:53 engine.py:389] module = importlib.import_module(module_name)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/lib/python3.12/importlib/init.py", line 90, in import_module
ERROR 02-19 19:40:53 engine.py:389] return _bootstrap._gcd_import(name[level:], package, level)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "", line 1387, in _gcd_import
ERROR 02-19 19:40:53 engine.py:389] File "", line 1360, in _find_and_load
ERROR 02-19 19:40:53 engine.py:389] File "", line 1331, in _find_and_load_unlocked
ERROR 02-19 19:40:53 engine.py:389] File "", line 935, in _load_unlocked
ERROR 02-19 19:40:53 engine.py:389] File "", line 999, in exec_module
ERROR 02-19 19:40:53 engine.py:389] File "", line 488, in _call_with_frames_removed
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/backends/xformers.py", line 7, in
ERROR 02-19 19:40:53 engine.py:389] from xformers import ops as xops
ERROR 02-19 19:40:53 engine.py:389] ModuleNotFoundError: No module named 'xformers'
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
raise e
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args
return cls(ipc_path=ipc_path,
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init
self.model_executor = executor_class(vllm_config=vllm_config, )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in init
self._init_executor()
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 45, in _init_executor
self.collective_rpc("init_worker", args=([kwargs], ))
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2196, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 565, in init_worker
self.worker = worker_class(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 82, in init
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1061, in init
self.attn_backend = get_attn_backend(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/attention/selector.py", line 95, in get_attn_backend
return _cached_get_attn_backend(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/attention/selector.py", line 154, in _cached_get_attn_backend
return resolve_obj_by_qualname(attention_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 1877, in resolve_obj_by_qualname
module = importlib.import_module(module_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/importlib/init.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1387, in _gcd_import
File "", line 1360, in _find_and_load
File "", line 1331, in _find_and_load_unlocked
File "", line 935, in _load_unlocked
File "", line 999, in exec_module
File "", line 488, in _call_with_frames_removed
File "/usr/local/lib/python3.12/dist-packages/vllm/attention/backends/xformers.py", line 7, in
from xformers import ops as xops
ModuleNotFoundError: No module named 'xformers'
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 8, in
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
args.dispatch_function(args)
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 928, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

Image

vllm --version
INFO 02-19 19:45:51 init.py:197] Automatically detected platform cuda.
0.7.3.dev203+ge2603fef.d20250219

python3 -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)"
True 12.6

🐛 Describe the bug

The arm image I built from the source code appeared No module named 'xformers'

INFO 02-19 19:40:50 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3.dev203+ge2603fef.d20250219) with config: model='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
INFO 02-19 19:40:53 cuda.py:178] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 02-19 19:40:53 cuda.py:226] Using XFormers backend.
ERROR 02-19 19:40:53 engine.py:389] No module named 'xformers'
ERROR 02-19 19:40:53 engine.py:389] Traceback (most recent call last):
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine
ERROR 02-19 19:40:53 engine.py:389] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args
ERROR 02-19 19:40:53 engine.py:389] return cls(ipc_path=ipc_path,
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init
ERROR 02-19 19:40:53 engine.py:389] self.engine = LLMEngine(*args, **kwargs)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init
ERROR 02-19 19:40:53 engine.py:389] self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in init
ERROR 02-19 19:40:53 engine.py:389] self._init_executor()
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 45, in _init_executor
ERROR 02-19 19:40:53 engine.py:389] self.collective_rpc("init_worker", args=([kwargs], ))
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 02-19 19:40:53 engine.py:389] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2196, in run_method
ERROR 02-19 19:40:53 engine.py:389] return func(*args, **kwargs)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 565, in init_worker
ERROR 02-19 19:40:53 engine.py:389] self.worker = worker_class(**kwargs)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 82, in init
ERROR 02-19 19:40:53 engine.py:389] self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1061, in init
ERROR 02-19 19:40:53 engine.py:389] self.attn_backend = get_attn_backend(
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/selector.py", line 95, in get_attn_backend
ERROR 02-19 19:40:53 engine.py:389] return _cached_get_attn_backend(
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/selector.py", line 154, in _cached_get_attn_backend
ERROR 02-19 19:40:53 engine.py:389] return resolve_obj_by_qualname(attention_cls)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 1877, in resolve_obj_by_qualname
ERROR 02-19 19:40:53 engine.py:389] module = importlib.import_module(module_name)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "/usr/lib/python3.12/importlib/init.py", line 90, in import_module
ERROR 02-19 19:40:53 engine.py:389] return _bootstrap._gcd_import(name[level:], package, level)
ERROR 02-19 19:40:53 engine.py:389] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-19 19:40:53 engine.py:389] File "", line 1387, in _gcd_import
ERROR 02-19 19:40:53 engine.py:389] File "", line 1360, in _find_and_load
ERROR 02-19 19:40:53 engine.py:389] File "", line 1331, in _find_and_load_unlocked
ERROR 02-19 19:40:53 engine.py:389] File "", line 935, in _load_unlocked
ERROR 02-19 19:40:53 engine.py:389] File "", line 999, in exec_module
ERROR 02-19 19:40:53 engine.py:389] File "", line 488, in _call_with_frames_removed
ERROR 02-19 19:40:53 engine.py:389] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/backends/xformers.py", line 7, in
ERROR 02-19 19:40:53 engine.py:389] from xformers import ops as xops
ERROR 02-19 19:40:53 engine.py:389] ModuleNotFoundError: No module named 'xformers'
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
raise e
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args
return cls(ipc_path=ipc_path,
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 75, in init
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in init
self.model_executor = executor_class(vllm_config=vllm_config, )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in init
self._init_executor()
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 45, in _init_executor
self.collective_rpc("init_worker", args=([kwargs], ))
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2196, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 565, in init_worker
self.worker = worker_class(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 82, in init
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1061, in init
self.attn_backend = get_attn_backend(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/attention/selector.py", line 95, in get_attn_backend
return _cached_get_attn_backend(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/attention/selector.py", line 154, in _cached_get_attn_backend
return resolve_obj_by_qualname(attention_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 1877, in resolve_obj_by_qualname
module = importlib.import_module(module_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/importlib/init.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1387, in _gcd_import
File "", line 1360, in _find_and_load
File "", line 1331, in _find_and_load_unlocked
File "", line 935, in _load_unlocked
File "", line 999, in exec_module
File "", line 488, in _call_with_frames_removed
File "/usr/local/lib/python3.12/dist-packages/vllm/attention/backends/xformers.py", line 7, in
from xformers import ops as xops
ModuleNotFoundError: No module named 'xformers'
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 8, in
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
args.dispatch_function(args)
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 928, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

Image

vllm --version

INFO 02-19 19:45:51 init.py:197] Automatically detected platform cuda.
0.7.3.dev203+ge2603fef.d20250219

python3 -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)"
True 12.6

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions