Skip to content

[agent][Fix] Fix SkyRLAgentPPOTrainer after switch to async#1237

Merged
SumanthRH merged 3 commits intomainfrom
fix-async-skyrlagent
Feb 28, 2026
Merged

[agent][Fix] Fix SkyRLAgentPPOTrainer after switch to async#1237
SumanthRH merged 3 commits intomainfrom
fix-async-skyrlagent

Conversation

@SumanthRH
Copy link
Member

@SumanthRH SumanthRH commented Feb 28, 2026

What does this PR do?

Fixes SkyRLAgentPPOTrainer after #1235 . Previously the SkyRLAgentPPOTrainer.train was a sync function, even though we switched to making the base class's method RayPPOTrainer.train async in #868 . Training still progressed as usual but it would have errored out at the end of training when the return value would be evaluated by asyncio.run(...)

This PR is a follow-up to #1235 to transition the SkyRLAgentPPOTrainer.train to an async function.


Open with Devin

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH SumanthRH marked this pull request as ready for review February 28, 2026 05:27
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly transitions the SkyRLAgentPPOTrainer.train method to be asynchronous, aligning it with the async method in its base class. The changes replace blocking asyncio.run() calls with non-blocking await expressions, which is the correct approach for handling coroutines within an async function. The asyncio import is also correctly removed as it's no longer used with these changes. I have one point of feedback regarding blocking calls that remain in the train method.

if self.colocate_all:
self.policy_model.offload_to_cpu(offload_optimizer=True, offload_model=False)
asyncio.run(self.inference_engine_client.wake_up(tags=["weights"]))
await self.inference_engine_client.wake_up(tags=["weights"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this change to use await is correct, the train method still contains blocking calls like ray.get() on lines 302 and 422. These calls will block the asyncio event loop, which can negate the benefits of making this method asynchronous. To make this method fully non-blocking, these should be replaced with asynchronous equivalents. For example, you can await Ray's ObjectRefs, possibly using asyncio.gather for lists of them. This would require re-importing asyncio.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct and meant to be synchronous.

The new trainer uses the equivalent dispatch.save_weights_for_sampler() method

devin-ai-integration[bot]

This comment was marked as resolved.

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH SumanthRH merged commit b2f6105 into main Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant