Skip to content

[fix] Convert TransferQueueController to non-detached Ray actor to prevent resource leaks#43

Merged
0oshowero0 merged 2 commits intoAscend:mainfrom
0oshowero0:remove_detach_actor
Mar 5, 2026
Merged

[fix] Convert TransferQueueController to non-detached Ray actor to prevent resource leaks#43
0oshowero0 merged 2 commits intoAscend:mainfrom
0oshowero0:remove_detach_actor

Conversation

@0oshowero0
Copy link
Copy Markdown
Collaborator

@0oshowero0 0oshowero0 commented Mar 5, 2026

Background

Previously, interface.py created a detached Ray actor for the TransferQueueController. Although a tq.close() method was provided to terminate the actor, it might not be executed in the event of an unexpected crash or error. This could cause the detached actor to persist indefinitely, leading to resource leaks.

Changes

  1. Lifecycle Management: Converted TransferQueueController from a detached actor to a standard (non-detached) actor. Its lifecycle is now tied to the driver process.
  2. Instance Handling: Utilized a global variable to store and manage the actor handle.
  3. Cleanup Logic: Improved the robustness of the tq.close() method.

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Copilot AI review requested due to automatic review settings March 5, 2026 03:23
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

@0oshowero0 0oshowero0 changed the title [fix] Convert detached TransferQueueController to normal ray actor [fix] Convert TransferQueueController to non-detached Ray actor to prevent resource leaks Mar 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes how TransferQueueController is created and tracked so it is no longer a detached Ray actor, reducing the risk of orphaned controllers when tq.close() is not executed.

Changes:

  • Introduces a module-level _TRANSFER_QUEUE_CONTROLLER handle to store the controller actor reference.
  • Creates TransferQueueController as a normal (non-detached) named actor and reuses the cached handle where possible.
  • Updates close() to attempt to kill the controller via the cached handle.
Comments suppressed due to low confidence (1)

transfer_queue/interface.py:252

  • close() now attempts to kill the controller twice: first via the cached handle and then again by fetching the named actor. This is redundant work and makes the shutdown logic harder to reason about (and can produce avoidable errors/logs). Consider a single best-effort kill path (e.g., kill cached handle if present else try ray.get_actor()).
    if _TRANSFER_QUEUE_CONTROLLER:
        ray.kill(_TRANSFER_QUEUE_CONTROLLER)
        _TRANSFER_QUEUE_CONTROLLER = None

    try:
        controller = ray.get_actor("TransferQueueController")
        ray.kill(controller)
    except Exception:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread transfer_queue/interface.py
Comment thread transfer_queue/interface.py
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

@0oshowero0 0oshowero0 merged commit a195dd5 into Ascend:main Mar 5, 2026
5 checks passed
@0oshowero0 0oshowero0 deleted the remove_detach_actor branch March 20, 2026 01:20
0oshowero0 added a commit that referenced this pull request Mar 30, 2026
Follow PR #43 to use
non-detached Ray actor for `SimpleStorageUnit`

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants