Skip to content

[Installation]: The startup failed, and it might be related to xformers. #13279

@zeliu

Description

@zeliu

Your current environment

PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: CentOS Linux release 7.9.2009 (Core) (x86_64)
GCC version: (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
Clang version: Could not collect
CMake version: version 3.26.4
Libc version: glibc-2.17

Python version: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.92.1.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla V100-PCIE-32GB
Nvidia driver version: 535.183.06
cuDNN version: Probably one of the following:
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn.so.8.9.6
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.6
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.6
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.6
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.6
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.6
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.6
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6151 CPU @ 3.00GHz
Stepping:              4
CPU MHz:               3000.000
BogoMIPS:              6000.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd rsb_ctxsw ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat md_clear spec_ctrl intel_stibp flush_l1d

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-cusparselt-cu12==0.6.2
[pip3] nvidia-ml-py==12.570.86
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pyzmq==26.2.1
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.48.3
[pip3] triton==3.1.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-cublas-cu12        12.4.5.8                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.4.127                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.4.127                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.2.1.3                 pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.5.147               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.6.1.9                 pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.3.1.170               pypi_0    pypi
[conda] nvidia-cusparselt-cu12    0.6.2                    pypi_0    pypi
[conda] nvidia-ml-py              12.570.86                pypi_0    pypi
[conda] nvidia-nccl-cu12          2.21.5                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.4.127                 pypi_0    pypi
[conda] pyzmq                     26.2.1                   pypi_0    pypi
[conda] torch                     2.5.1                    pypi_0    pypi
[conda] torchvision               0.20.1                   pypi_0    pypi
[conda] transformers              4.48.3                   pypi_0    pypi
[conda] triton                    3.1.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.4.post1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0	CPU Affinity	NUMA Affinity
GPU0	 X 	0-7	0

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

LD_LIBRARY_PATH=/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/cv2/../../lib64:
CUDA_MODULE_LOADING=LAZY

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 535.183.06   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-PCIE-32GB            Off| 00000000:00:0D.0 Off |                    0 |
| N/A   31C    P0               35W / 250W|      0MiB / 32768MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

How you are installing vllm

pip install  vllm

VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.api_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --max_model 4096 --port 8080

error is :

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.6.0+cu124 with CUDA 1201 (you have 2.5.1+cu124)
Python 3.12.9 (you have 3.12.9)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/xformers/_cpp_lib.py", line 132, in _register_extensions
torch.ops.load_library(ext_specs.origin)
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/torch/_ops.py", line 1350, in load_library
ctypes.CDLL(path)
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/ctypes/init.py", line 379, in init
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/xformers/_C.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/xformers/_cpp_lib.py", line 142, in
_build_metadata = _register_extensions()
^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/xformers/_cpp_lib.py", line 134, in _register_extensions
raise xFormersInvalidLibException(build_metadata) from exc
xformers._cpp_lib.xFormersInvalidLibException: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.6.0+cu124 with CUDA 1201 (you have 2.5.1+cu124)
Python 3.12.9 (you have 3.12.9)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/entrypoints/api_server.py", line 163, in
asyncio.run(run_server(args))
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/entrypoints/api_server.py", line 119, in run_server
app = await init_app(args, llm_engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/entrypoints/api_server.py", line 107, in init_app
if llm_engine is not None else AsyncLLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 691, in from_engine_args
engine = cls(
^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 578, in init
self.engine = self._engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 264, in init
super().init(*args, **kwargs)
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 347, in init
self.model_executor = executor_class(vllm_config=vllm_config, )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 36, in init
self._init_executor()
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/executor/gpu_executor.py", line 38, in _init_executor
self.driver_worker = self._create_worker()
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/executor/gpu_executor.py", line 96, in _create_worker
return create_worker(**self._get_create_worker_kwargs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/executor/gpu_executor.py", line 24, in create_worker
wrapper.init_worker(**kwargs)
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 465, in init_worker
self.worker = worker_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/worker/worker.py", line 82, in init
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/worker/model_runner.py", line 1029, in init
self.attn_backend = get_attn_backend(
^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/attention/selector.py", line 105, in get_attn_backend
return _cached_get_attn_backend(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/attention/selector.py", line 145, in _cached_get_attn_backend
from vllm.attention.backends.xformers import ( # noqa: F401
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/vllm/attention/backends/xformers.py", line 6, in
from xformers import ops as xops
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/xformers/ops/init.py", line 35, in
from .sp24 import Sparse24Tensor, sparsify24, sparsify24_like
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/xformers/ops/sp24.py", line 77, in
_cusplt_version = _get_cusparselt_torch_version()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/xformers/ops/sp24.py", line 65, in _get_cusparselt_torch_version
lib = ctypes.CDLL(lib_path)
^^^^^^^^^^^^^^^^^^^^^
File "/mnt/anaconda3/envs/vllm-py312/lib/python3.12/ctypes/init.py", line 379, in init
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /mnt/anaconda3/envs/vllm-py312/lib/python3.12/site-packages/xformers/_C.so: undefined symbol: _ZNK3c1011StorageImpl27throw_data_ptr_access_errorEv

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    installationInstallation problemsstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions