You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems the executor will always select GPU 0,1,2,3... as the devices of ray workers. And this makes it impossible for user to assgin devices using export CUDA_VISIBLE_DEVICES=2,3 or os.environ["CUDA_VISIBLE_DEVICES"] = "2,3".
The bug could come with the below code:
worker_node_and_gpu_ids = []
for worker in [self.driver_dummy_worker] + self.workers:
if worker is None:
# driver_dummy_worker can be None when using ray spmd worker.
continue
worker_node_and_gpu_ids.append(
ray.get(worker.get_node_and_gpu_ids.remote()) \
) # type: ignore
for i, (node_id, gpu_ids) in enumerate(worker_node_and_gpu_ids):
node_workers[node_id].append(i)
# `gpu_ids` can be a list of strings or integers.
# convert them to integers for consistency.
# NOTE: gpu_ids can be larger than 9 (e.g. 16 GPUs),
# string sorting is not sufficient.
# see https://github.com/vllm-project/vllm/issues/5590
gpu_ids = [int(x) for x in gpu_ids]
node_gpus[node_id].extend(gpu_ids)
for node_id, gpu_ids in node_gpus.items():
node_gpus[node_id] = sorted(gpu_ids)
# Set environment variables for the driver and workers.
all_args_to_update_environment_variables = [{
current_platform.device_control_env_var:
",".join(map(str, node_gpus[node_id])),
} for (node_id, _) in worker_node_and_gpu_ids]
in vllm/vllm/executor/ray_distributed_executor.py.
print worker_node_and_gpu_ids, node_gpus and all_args_to_update_environment_variables will get:
And then in functionupdate_environment_variables, the 'CUDA_VISIBLE_DEVICES' in os.environ will be rewritten by the above settings:
def update_environment_variables(self, envs_list: List[Dict[str, str]]) -> None:
envs = envs_list[self.rpc_rank]
key = 'CUDA_VISIBLE_DEVICES'
if key in envs and key in os.environ:
# overwriting CUDA_VISIBLE_DEVICES is desired behavior
# suppress the warning in `update_environment_variables`
del os.environ[key]
update_environment_variables(envs)
Other issue #14334 and #14191 report similar problems.
It seem ray workers have not provide a chance for users to change the visible gpu devices.
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
It seems the executor will always select GPU 0,1,2,3... as the devices of ray workers. And this makes it impossible for user to assgin devices using
export CUDA_VISIBLE_DEVICES=2,3
oros.environ["CUDA_VISIBLE_DEVICES"] = "2,3"
.The bug could come with the below code:
in vllm/vllm/executor/ray_distributed_executor.py.
print
worker_node_and_gpu_ids
,node_gpus
andall_args_to_update_environment_variables
will get:And then in function
update_environment_variables
, the 'CUDA_VISIBLE_DEVICES' inos.environ
will be rewritten by the above settings:Other issue #14334 and #14191 report similar problems.
It seem ray workers have not provide a chance for users to change the visible gpu devices.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: