Skip to content

✨ feat(Ray): Enhance Ray #2847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

binw666
Copy link

@binw666 binw666 commented Jul 3, 2025

Description

This PR:

  • Add task_runner.py to support specifying resources, py_modules, and pip.

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Add task_runner.py to support specifying resources, py_modules, and pip.

isaac-sim#2632
@binw666
Copy link
Author

binw666 commented Jul 4, 2025

@ozhanozen @garylvov I have added task_runner.py as an additional method for submitting jobs. This method supports specifying resources, py_modules, and pip dependencies. If you have time, could you please help review the code? Thank you!

@garylvov
Copy link
Collaborator

garylvov commented Jul 8, 2025

Hi I will review by the end of this weekend please ping me if no activity within 1 week


YAML configuration example:
---------------------------
.. code-block:: yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can you show a real example in a config file as well if possible? Maybe will make this example a bit more clear

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the most recent commit, I have updated the comments and included an official example from IsaacLab for multi-GPU training. This may help clarify the usage.

return True


def submit_tasks(ray_address,pip,py_modules,tasks):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does resemble the submit_job function in submit_job.py, but it serves a different purpose. Here, the function calls remote to execute, whereas submit_job submits tasks via client.submit_job. Do you have any better suggestions?

@garylvov
Copy link
Collaborator

garylvov commented Jul 8, 2025

Hi @binw666 thank you for the PR please let me know if you have any questions about my comments I'm happy to help. Once you address these issues just rerequest me for the review or @ me and I'll check it out again.

Thanks!

@binw666
Copy link
Author

binw666 commented Jul 8, 2025

@garylvov Thank you very much for your review. I have updated the code and documentation to address the above issues.

@ozhanozen
Copy link
Contributor

@binw666, I don't have a HW currently that I can use to test the functionality of the PR, but I try to give feedback on other stuff as much as possible.

I noticed that some of the highlighted line numbers of the file ray.rst are outdated (some before your changes). Would it be possible for you to go over the file and update them as the last step once everything else is done?

@ozhanozen
Copy link
Contributor

@garylvov @binw666

I’m not sure if this PR is the right place to bring this up or if it should be addressed in a separate one, but based on my experience, I believe the current Ray integration is missing a key feature: support for a custom CLIReporter.

Right now, the integration defaults to the standard CLIReporter, which often displays metrics that aren’t particularly relevant. Similar to how we allow users to specify a --cfg_class (e.g., CartpoleTheiaJobCfgE), I suggest we let them optionally provide a custom CLIReporter class (that goes to the RunConfig as an argument). If none is provided, we can fall back to the default.

What do you think about this?

@garylvov
Copy link
Collaborator

garylvov commented Jul 8, 2025

I think a custom CLIReporter could be good. As for including it in this PR, I am personally ok with it, but it's up to @binw666 . I could see why @binw666 may want to limit scope here, in which case we can follow up with another PR after this one is merged.

@garylvov
Copy link
Collaborator

garylvov commented Jul 8, 2025

That being said, it's probably quickest to include that functionality in this PR in terms of getting things merged into main

@garylvov garylvov self-requested a review July 9, 2025 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Improve Current Ray Training Scripts
3 participants