-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
Description
Your current environment
When I use the following command:
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 9111 --model /deepseek_v3 --max-num-batched-tokens 16384 --gpu-memory-utilization 0.97 --tensor-parallel-size 8 --disable-log-requests --trust-remote-code --enable-chunked-prefill
it shows runtimeerror: nccl error 1:unhandled cuda error (run with nccl_debug=info for details)
model: deepseek v3
vllm : v0.7.1--->pip3 install vllm
how can i do?
🐛 Describe the bug
When I use the following command:
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 9111 --model /deepseek_v3 --max-num-batched-tokens 16384 --gpu-memory-utilization 0.97 --tensor-parallel-size 8 --disable-log-requests --trust-remote-code --enable-chunked-prefill
it shows runtimeerror: nccl error 1:unhandled cuda error (run with nccl_debug=info for details)
model: deepseek v3
vllm : v0.7.1--->pip3 install vllm
how can i do?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.