Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [AMD] [vLLM=0.7.3] ValueError: Model architectures ['Qwen2_5_VLForConditionalGeneration'] failed to be inspected. #14983

Open
1 task done
iraj465 opened this issue Mar 17, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@iraj465
Copy link

iraj465 commented Mar 17, 2025

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` 
Collecting environment information...                                                                                                                                                                                                     PyTorch version: 2.7.0a0+git3a58512                                                                                                                                                                                                       Is debug build: False                                                                                                                                                                                                                     CUDA used to build PyTorch: N/A                                                                                                                                                                                                           ROCM used to build PyTorch: 6.3.42133-1b9c17779                                                                                                                                                                                                                                                                                                                                                                                                                                     OS: Ubuntu 22.04.5 LTS (x86_64)                                                                                                                                                                                                           GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0                                                                                                                                                                                        Clang version: 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.3.1 24491 1e0fda770a2079fbd71e4b70974d74f62fd3af10)                                                                                                     CMake version: version 3.31.4                                                                                                                                                                                                             Libc version: glibc-2.35                                                                                                                                                                                                                                                                                                                                                                                                                                                            Python version: 3.12.8 (main, Dec  4 2024, 08:54:12) [GCC 11.4.0] (64-bit runtime)                                                                                                                                                        Python platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35                                                                                                                                                                            Is CUDA available: True                                                                                                                                                                                                                   CUDA runtime version: Could not collect                                                                                                                                                                                                   CUDA_MODULE_LOADING set to: LAZY                                                                                                                                                                                                          GPU models and configuration: AMD Instinct MI300X (gfx942:sramecc+:xnack-)                                                                                                                                                                Nvidia driver version: Could not collect                                                                                                                                                                                                  cuDNN version: Could not collect                                                                                                                                                                                                          HIP runtime version: 6.3.42133                                                                                                                                                                                                            MIOpen runtime version: 3.3.0                                                                                                                                                                                                             Is XNNPACK available: True     
Versions of relevant libraries:                                                                                                                                                                                                           [pip3] numpy==1.26.4                                                                                                                                                                                                                      [pip3] pyzmq==26.2.0                                                                                                                                                                                                                      [pip3] torch==2.7.0a0+git3a58512                                                                                                                                                                                                          [pip3] torchdata==0.11.0                                                                                                                                                                                                                  [pip3] torchvision==0.19.1a0+6194369                                                                                                                                                                                                      [pip3] transformers==4.50.0.dev0                                                                                                                                                                                                          [pip3] transformers==4.49.0                                                                                                                                                                                                               [pip3] triton==3.2.0+gite5be006a                                                                                                                                                                                                          [conda] Could not collect                                                                                                                                                                                                                 ROCM Version: 6.3.42133-1b9c17779                                                                                                                                                                                                         Neuron SDK Version: N/A                                                                                                                                                                                                                   vLLM Version: 0.7.4.dev332+gaf40d336b

🐛 Describe the bug

Followed AMD-Doc

It ran for the examples scripts for GRPO on LLMs.
However, i'm trying to run for VLMs, bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh

Getting this error :

(main_task pid=294091)     self.multimodal_config = self._init_multimodal_config(
(main_task pid=294091)                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=294091)   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/config.py", line 460, in _init_multimodal_config
(main_task pid=294091)     if self.registry.is_multimodal_model(self.architectures):
(main_task pid=294091)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=294091)   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 478, in is_multimodal_model
(main_task pid=294091)     model_cls, _ = self.inspect_model_cls(architectures)
(main_task pid=294091)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=294091)   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 438, in inspect_model_cls
(main_task pid=294091)     return self._raise_for_unsupported(architectures)
(main_task pid=294091)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(main_task pid=294091)   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 390, in _raise_for_unsupported
(main_task pid=294091)     raise ValueError(
(main_task pid=294091) ValueError: Model architectures ['Qwen2VLForConditionalGeneration'] failed to be inspected. Please check the logs for more details.
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330] Error in inspecting model architecture 'Qwen2VLForConditionalGeneration'
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330] Traceback (most recent call last): [repeated 3x across cluster]
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 556, in _run_in_subprocess [repeated 2x across cluster]
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     returned.check_returncode()
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "/usr/lib/python3.12/subprocess.py", line 502, in check_returncode
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     raise CalledProcessError(self.returncode, self.args, self.stdout,
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330] subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]  [repeated 3x across cluster]
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330] The above exception was the direct cause of the following exception:
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 328, in _try_inspect_model_cls
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     return model.inspect_model_cls()
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/model_executor/models/registry.py", line 299, in inspect_model_cls
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     return _run_in_subprocess(
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]            ^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     raise RuntimeError(f"Error raised in subprocess:\n"
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330] RuntimeError: Error raised in subprocess:
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "<frozen runpy>", line 189, in _run_module_as_main
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "<frozen runpy>", line 112, in _get_module_details
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/platforms/rocm.py", line 70, in <module> [repeated 7x across cluster]
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     from vllm.executor.executor_base import ExecutorBase
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     from vllm.model_executor.layers.sampler import SamplerOutput
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     from vllm.spec_decode.metrics import SpecDecodeWorkerMetrics
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     from vllm.model_executor.layers.spec_decode_base_sampler import (
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     from vllm.platforms import current_platform
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/platforms/__init__.py", line 288, in __getattr__
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     _current_platform = resolve_obj_by_qualname(
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]                         ^^^^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "/usr/local/lib/python3.12/dist-packages/vllm-0.7.4.dev332+gaf40d336b.rocm631-py3.12-linux-x86_64.egg/vllm/utils.py", line 2092, in resolve_obj_by_qualname
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     module = importlib.import_module(module_name)
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]   File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     return _bootstrap._gcd_import(name[level:], package, level)
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]     assert val == cuda_val
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330]            ^^^^^^^^^^^^^^^
(WorkerDict pid=307390) ERROR 03-17 19:56:11 [registry.py:330] AssertionError

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@iraj465 iraj465 added the bug Something isn't working label Mar 17, 2025
@iraj465
Copy link
Author

iraj465 commented Mar 17, 2025

Note: This is the base image for the Dockerfile docker pull rocm/vllm:instinct_main -vLLM 0.7.3. The rest steps are same as provided in the Dockerfile.rocm

@DarkLight1337
Copy link
Member

Did you set HIP_VISIBLE_DEVICES and CUDA_VISIBLE_DEVICES at the same time? If so, make sure they are consistent with each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants