-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
Open
Labels
Description
Your current environment
The output of python collect_env.py
==============================
PyTorch Info
==============================
PyTorch version : 2.8.0.dev20250615+rocm6.4
Is debug build : False
CUDA used to build PyTorch : N/A
ROCM used to build PyTorch : 6.4.43482-0f2d60242
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : Could not collect
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration : AMD Instinct MI300X (gfx942:sramecc+:xnack-)
Nvidia driver version : Could not collect
cuDNN version : Could not collect
HIP runtime version : 6.4.43482
MIOpen runtime version : 3.4.0
Is XNNPACK available : True
==============================
vLLM Info
==============================
ROCM Version : 6.4.43483-a187df25c
Neuron SDK Version : N/A
vLLM Version : 0.9.2.dev95+g26bc46ef8.d20250616 (git sha: 26bc46ef8, date: 20250616)
🐛 Describe the bug
The current branch crashes with rocm:
libcuda.so.1: cannot open shared object file
The issue is here, it doesn't check for hip: https://github.com/vllm-project/vllm/blob/main/vllm/device_allocator/cumem.py#L48-L64
Replacing it with something like this solves the issue:
cumem_available = False
try:
if torch.version.hip:
raise RuntimeError("Skipping CuMemAllocator on ROCm platform")
from vllm.cumem_allocator import (init_module, python_create_and_map, python_unmap_and_release)
from vllm.distributed.device_communicators.cuda_wrapper import (CudaRTLibrary)
lib_name = find_loaded_library("cumem_allocator")
cumem_available = lib_name is not None
except Exception:
# rocm platform does not support cumem allocator
init_module = None
python_create_and_map = None
python_unmap_and_release = None
CudaRTLibrary = None
lib_name = None
libcudart = None
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.