Description
Your current environment
PyTorch version: 2.6.0+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 12.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17
Python version: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-3.10.0-1160.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 12.2.140
🐛 Describe the bug
我使用vllm 0.8.2 torch 0.2.6启动qwen2.5-32n-int4模型,启动命令为:
python -m vllm.entrypoints.openai.api_server
--served-model-name qwen2.5-32n-int4
--model qwen2.5-32n-int4
--tensor-parallel-size 2
--port 8019
--dtype float16
--enforce-eager
--trust-remote-code
--gpu-memory-utilization 0.7
--max-model-len 3200
报错为:
(VllmWorker rank=0 pid=10003) INFO 04-02 10:00:12 [backends.py:415] Using cache directory: /root/.cache/vllm/torch_compile_cache/68e5addbf5/rank_0_0 for vLLM's torch.compile
(VllmWorker rank=0 pid=10003) INFO 04-02 10:00:13 [backends.py:425] Dynamo bytecode transform time: 18.08 s
(VllmWorker rank=1 pid=10014) INFO 04-02 10:00:13 [backends.py:415] Using cache directory: /root/.cache/vllm/torch_compile_cache/68e5addbf5/rank_1_0 for vLLM's torch.compile
(VllmWorker rank=1 pid=10014) INFO 04-02 10:00:13 [backends.py:425] Dynamo bytecode transform time: 18.53 s
gcc: fatal error: cannot execute ‘cc1’: execvp: 没有那个文件或目录
compilation terminated.
ERROR 04-02 10:00:15 [core.py:343] EngineCore hit an exception: Traceback (most recent call last):
ERROR 04-02 10:00:15 [core.py:343] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 335, in run_engine_core
ERROR 04-02 10:00:15 [core.py:343] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 04-02 10:00:15 [core.py:343] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 290, in init
ERROR 04-02 10:00:15 [core.py:343] super().init(vllm_config, executor_class, log_stats)
ERROR 04-02 10:00:15 [core.py:343] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 63, in init
ERROR 04-02 10:00:15 [core.py:343] num_gpu_blocks, num_cpu_blocks = self._initialize_kv_caches(
ERROR 04-02 10:00:15 [core.py:343] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 122, in _initialize_kv_caches
ERROR 04-02 10:00:15 [core.py:343] available_gpu_memory = self.model_executor.determine_available_memory()
ERROR 04-02 10:00:15 [core.py:343] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 66, in determine_available_memory
ERROR 04-02 10:00:15 [core.py:343] output = self.collective_rpc("determine_available_memory")
ERROR 04-02 10:00:15 [core.py:343] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 134, in collective_rpc
ERROR 04-02 10:00:15 [core.py:343] raise e
ERROR 04-02 10:00:15 [core.py:343] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 118, in collective_rpc
ERROR 04-02 10:00:15 [core.py:343] status, result = w.worker_response_mq.dequeue(
ERROR 04-02 10:00:15 [core.py:343] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 471, in dequeue
ERROR 04-02 10:00:15 [core.py:343] obj = pickle.loads(buf[1:])
ERROR 04-02 10:00:15 [core.py:343] TypeError: BackendCompilerFailed.init() missing 1 required positional argument: 'inner_exception'
ERROR 04-02 10:00:15 [core.py:343]
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] WorkerProc hit an exception: %s
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] Traceback (most recent call last):
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 372, in worker_busy_loop
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] return func(*args, **kwargs)
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] self.model_runner.profile_run()
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1499, in profile_run
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] hidden_states = self._dummy_run(self.max_num_tokens)
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] return func(*args, **kwargs)
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1336, in _dummy_run
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] hidden_states = model(
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 462, in forward
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 238, in call
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] output = self.compiled_callable(*args, **kwargs)
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] return fn(*args, **kwargs)
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1380, in call
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] return self._torchdynamo_orig_callable(
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 547, in call
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] return _compile(
(VllmWorker rank=0 pid=10003) ERROR 04-02 10:00:15 [multiproc_executor.py:379] File "/root/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile
CRITICAL 04-02 10:00:15 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.rm)
已杀死
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.