[Bugfix][Ray] Set the cuda context eagerly in the ray worker #19583

kouroshHakha · 2025-06-12T23:46:57Z

This PR sets the cuda context eagerly in the ray actor (torch.cuda.set_device is actually lazy I think)

There was a bug in Ray + P/D where the nixl/ucx cuda context is checked via direct cuda devices and it fails because Ray runs the execution of the model in a background thread which does not inherit the cuda context.

gemini-code-assist · 2025-06-12T23:47:01Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

github-actions · 2025-06-12T23:47:07Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

ruisearch42

Thanks for the fix.

cc @youkaichao to take a look, who recently investigated the same problem under a different context:
pytorch/pytorch#155668
https://forums.developer.nvidia.com/t/whats-the-expected-behavior-of-calling-cudagetdevice-when-the-process-has-no-cuda-context/335784

ruisearch42 · 2025-06-13T00:36:38Z

.buildkite/test-pipeline.yaml

@@ -759,6 +759,7 @@ steps:
  - torchrun --nproc_per_node=2 distributed/test_ca_buffer_sharing.py
  - TARGET_TEST_SUITE=A100 pytest basic_correctness/ -v -s -m 'distributed(num_gpus=2)'
  - pytest -v -s -x lora/test_mixtral.py
+  - pytest -v -s cuda/test_cuda_context.py


This doesn't need to be under 4xA100 tests?

ruisearch42 · 2025-06-13T00:39:57Z

vllm/executor/ray_utils.py

@@ -113,6 +113,9 @@ def setup_device_if_necessary(self):
                    # Not needed
                    pass
                else:
+                    if current_platform.is_cuda():
+                        from vllm.platforms.cuda import set_cuda_context
+                        set_cuda_context(self.worker.device)


should we assert True here?

why change here? you just need to change the set_device implementation in the cuda platform.

so set_device method does not explicitly exist on cuda Platform definition. It's going through __getattr__ I think here https://github.com/vllm-project/vllm/blob/main/vllm/platforms/interface.py#L498? otherwise I don't know how set_device is mapped to torch.cuda.set_device on Cuda Platform class. For lack of understanding that I implemented it separately.

yeah you can add a set_device interface, and default to torch.xxx.set_device, and override it in cuda

ruisearch42 · 2025-06-13T00:40:32Z

vllm/platforms/cuda.py

+
+    try:
+        # Load CUDA driver library
+        cuda = ctypes.CDLL('libcuda.so')


Wondering how robust this is, as compared to the workaround of creating a tensor.

youkaichao · 2025-06-13T12:25:58Z

vllm/platforms/cuda.py

@@ -50,6 +51,65 @@ def wrapper(*args: _P.args, **kwargs: _P.kwargs) -> _R:
    return wrapper


+def set_cuda_context(device: Union[torch.device, int]) -> bool:


i would suggest just creating a tensor on the target device, and then call torch.cuda.set_device . creating a context this way might interfere with other functionality, e.g. cuCtxCreate_v2 might be deprecated in the future (in favor of cuCtxCreate_v3), and calling the function would break other things easily.

youkaichao

didn't expect it occurs here too 👀

kouroshHakha · 2025-06-13T23:39:01Z

Cool, I think I addressed all the comments. Plz take a look again @youkaichao @ruisearch42
PS: I think the failed tests are not relevant?
PS: new tests pass:

vllm/platforms/cuda.py

youkaichao · 2025-06-17T00:45:15Z

tests/cuda/test_cuda_context.py

+from vllm.platforms import current_platform
+
+
+def check_cuda_context():


use torch._C._cuda_hasPrimaryContext(int device)

Not sure how this API capture what we are tying to do here. torch._C._cuda_hasPrimaryContext(0) returns true even if the background thread is created. The current method returns false which is consistent with the problem that I was trying to solve.

youkaichao · 2025-06-17T00:46:50Z

tests/cuda/test_cuda_context.py

+            except Exception as e:
+                return False, f"Wrong exception: {type(e).__name__}: {e}"
+
+        with ThreadPoolExecutor(max_workers=1) as executor:


why do we use ThreadPool? just use normal functions starting with test_ should be fine.

noted. changed.

…more methods and datasets (vllm-project#18847) Signed-off-by: kouroshhakha <kourosh@anyscale.com>

Signed-off-by: kouroshhakha <kourosh@anyscale.com>

Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: kouroshhakha <kourosh@anyscale.com>

…ix-cuda-ctx

…oject#19583)

…oject#19583) Signed-off-by: juncheoll <th6re8e@naver.com>

…oject#19583) Signed-off-by: minpeter <kali2005611@gmail.com>

…oject#19583) Signed-off-by: fhl <2410591650@qq.com>

…oject#19583)

…oject#19583) Signed-off-by: Will Eaton <weaton@redhat.com>

…oject#19583)

youkaichao · 2025-07-07T09:30:49Z

vllm/platforms/interface.py

+        """
+        Set the device for the current platform.
+        """
+        torch.cuda.set_device(device)


should be getattr(torch, self.device_type) ?

…oject#19583) Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

kouroshHakha changed the title ~~Set the cuda context immediately in the ray worker~~ [Bugfix][Ray] Set the cuda context immediately in the ray worker Jun 12, 2025

mergify bot added the ci/build label Jun 12, 2025

kouroshHakha changed the title ~~[Bugfix][Ray] Set the cuda context immediately in the ray worker~~ [Bugfix][Ray] Set the cuda context eagerly in the ray worker Jun 12, 2025

ruisearch42 added the ready label Jun 12, 2025

ruisearch42 reviewed Jun 13, 2025

View reviewed changes

youkaichao reviewed Jun 13, 2025

View reviewed changes

kouroshHakha requested review from youkaichao and ruisearch42 June 13, 2025 23:38

youkaichao reviewed Jun 17, 2025

View reviewed changes

vllm/platforms/cuda.py Show resolved Hide resolved

youkaichao reviewed Jun 17, 2025

View reviewed changes

kouroshHakha requested a review from youkaichao June 18, 2025 04:50

ekagra-ranjan and others added 8 commits June 17, 2025 21:52

[Spec Decode][Benchmark] Generalize spec decode offline benchmark to …

5c7d8b2

…more methods and datasets (vllm-project#18847) Signed-off-by: kouroshhakha <kourosh@anyscale.com>

wip

c2d4e98

Signed-off-by: kouroshhakha <kourosh@anyscale.com>

wip

395b8d5

Signed-off-by: kouroshhakha <kourosh@anyscale.com>

clean up

6d6b26e

Signed-off-by: kouroshhakha <kourosh@anyscale.com>

tests

749903a

Signed-off-by: kouroshhakha <kourosh@anyscale.com>

wip

fa6465d

Signed-off-by: kouroshhakha <kourosh@anyscale.com>

wip

ef6c1c8

Signed-off-by: kouroshhakha <kourosh@anyscale.com>

Update vllm/platforms/cuda.py

4d9b41a

Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: kouroshhakha <kourosh@anyscale.com>

kouroshHakha force-pushed the kh/fix-cuda-ctx branch from 7972905 to 4d9b41a Compare June 18, 2025 04:52

mergify bot added the documentation label Jun 18, 2025

Merge branch 'main' of https://github.com/vllm-project/vllm into kh/f…

3541666

…ix-cuda-ctx

simon-mo approved these changes Jun 20, 2025

View reviewed changes

simon-mo merged commit 5e666f7 into vllm-project:main Jun 20, 2025
93 of 99 checks passed

yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jun 22, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

812d21f

…oject#19583)

juncheoll pushed a commit to juncheoll/vllm that referenced this pull request Jun 23, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

c5d0cfe

…oject#19583) Signed-off-by: juncheoll <th6re8e@naver.com>

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

b7d6822

…oject#19583) Signed-off-by: minpeter <kali2005611@gmail.com>

fhl2000 pushed a commit to fhl2000/vllm that referenced this pull request Jun 25, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

e12a111

…oject#19583) Signed-off-by: fhl <2410591650@qq.com>

gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Jun 26, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

b96aba1

…oject#19583)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

61deac4

…oject#19583)

wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

c2a4a53

…oject#19583) Signed-off-by: Will Eaton <weaton@redhat.com>

wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

6be1516

…oject#19583)

wwl2755-google pushed a commit to wwl2755-google/vllm that referenced this pull request Jul 1, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

4abbf9a

…oject#19583)

youkaichao reviewed Jul 7, 2025

View reviewed changes

avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025

[Bugfix][Ray] Set the cuda context eagerly in the ray worker (vllm-pr…

1da6654

…oject#19583) Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

		@@ -50,6 +51,65 @@ def wrapper(args: _P.args, *kwargs: _P.kwargs) -> _R:
		return wrapper


		def set_cuda_context(device: Union[torch.device, int]) -> bool:

		from vllm.platforms import current_platform


		def check_cuda_context():

Uh oh!

[Bugfix][Ray] Set the cuda context eagerly in the ray worker #19583

[Bugfix][Ray] Set the cuda context eagerly in the ray worker #19583

Uh oh!

Conversation

kouroshHakha commented Jun 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jun 12, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

ruisearch42 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

kouroshHakha commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kouroshHakha commented Jun 12, 2025 •

edited by github-actions bot

Loading

kouroshHakha Jun 13, 2025 •

edited

Loading

kouroshHakha commented Jun 13, 2025 •

edited

Loading

kouroshHakha Jun 18, 2025 •

edited

Loading