Skip to content

[feat] Support cross-job actor discovery via explicit namespace#115

Merged
0oshowero0 merged 1 commit into
Ascend:mainfrom
huniu20:feat/cross-job-namespace
Jun 5, 2026
Merged

[feat] Support cross-job actor discovery via explicit namespace#115
0oshowero0 merged 1 commit into
Ascend:mainfrom
huniu20:feat/cross-job-namespace

Conversation

@huniu20

@huniu20 huniu20 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

When multiple Ray Jobs share the same Ray cluster, Named Actors are isolated by namespace. Without an explicit namespace, a TQ Controller created by one job is invisible to workers in another job.

This commit adds namespace="transfer_queue" to both:

  • ray.get_actor() in _init_from_existing()
  • TransferQueueController.options() in init()

This ensures that the TQ Controller is always registered and discovered in the fixed "transfer_queue" namespace, enabling cross-job TQ sharing (e.g., a teacher server job creates TQ, and a trainer job connects to it).

This change is backward-compatible: single-job usage is unaffected since the namespace is consistent between creation and discovery.

@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@huniu20 , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
[5a1900d [feat] Support cross-job actor ...](5a1900d) the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@ji-huazhong ji-huazhong self-requested a review June 4, 2026 08:58
@ji-huazhong

Copy link
Copy Markdown
Collaborator

Thanks for your contribution. Kindly resolve failing CI tests first. @huniu20

@huniu20 huniu20 force-pushed the feat/cross-job-namespace branch from 5a1900d to d02f41b Compare June 4, 2026 09:09
@ascend-robot

Copy link
Copy Markdown

CLA Signature Guide

@huniu20 , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit Reason
[d02f41b [feat] Support cross-job actor ...](d02f41b) the email used in the commit is not linked to a signed CLA!
please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

@huniu20

huniu20 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

/check-cla

@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

huniu20, thanks for your pull request. All authors of the commits have signed the CLA. 👍

@huniu20 huniu20 force-pushed the feat/cross-job-namespace branch from d02f41b to 9af6f55 Compare June 4, 2026 09:51
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

huniu20, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes TransferQueue’s Ray named actors discoverable across multiple Ray Jobs sharing the same cluster by consistently using a fixed Ray namespace ("transfer_queue") when creating and retrieving the controller (and the Ray storage actor).

Changes:

  • Add namespace="transfer_queue" to ray.get_actor("TransferQueueController", ...) call sites (library + tests + examples).
  • Create TransferQueueController (and RayObjectRefStorage) explicitly in the "transfer_queue" namespace via .options(namespace=...).
  • Update E2E tests and tutorial/demo code to retrieve the controller from the fixed namespace.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tutorial/06_streaming_dataloader.py Fetches the controller from the fixed Ray namespace in the tutorial worker path.
transfer_queue/storage/clients/ray_storage_client.py Fetches/creates RayObjectRefStorage in the fixed Ray namespace to enable cross-job access.
transfer_queue/interface.py Fetches/creates TransferQueueController in the fixed Ray namespace for cross-job discovery.
tests/e2e/test_kv_interface_e2e.py Updates test fixture to retrieve controller from the fixed namespace.
recipe/simple_use_case/relax_demo.py Updates demo worker to retrieve controller from the fixed namespace.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

try:
if _TQ_CONTROLLER is None:
_TQ_CONTROLLER = ray.get_actor("TransferQueueController")
_TQ_CONTROLLER = ray.get_actor("TransferQueueController", namespace="transfer_queue")
# initialize actor
try:
self.storage_actor = ray.get_actor("RayObjectRefStorage")
self.storage_actor = ray.get_actor("RayObjectRefStorage", namespace="transfer_queue")
When multiple Ray Jobs share the same Ray cluster, Named Actors are
isolated by namespace. Without an explicit namespace, a TQ Controller
created by one job is invisible to workers in another job.

This commit adds namespace="transfer_queue" to both:
- ray.get_actor() in _init_from_existing()
- TransferQueueController.options() in init()

This ensures that the TQ Controller is always registered and discovered
in the fixed "transfer_queue" namespace, enabling cross-job TQ sharing
(e.g., a teacher server job creates TQ, and a trainer job connects to it).

This change is backward-compatible: single-job usage is unaffected since
the namespace is consistent between creation and discovery.

Signed-off-by: huniu20 <huniumail@gmail.com>
@huniu20 huniu20 force-pushed the feat/cross-job-namespace branch from 9af6f55 to b20656a Compare June 4, 2026 10:06
@ascend-robot

Copy link
Copy Markdown

CLA Signature Pass

huniu20, thanks for your pull request. All authors of the commits have signed the CLA. 👍

@0oshowero0 0oshowero0 merged commit 136ec16 into Ascend:main Jun 5, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants