-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for using Kuberay #266
Comments
You can also dump the commands you use / I can help with the docs from a user perspective , once I get it setup |
@karthik-nexusflow Setup ray cluster and submit openrlhf job to ray cluster are 2 separate stages.
# start head node first
ray start --head --port=6379 --node-ip-address=10.0.0.1
# start worker node 1
ray start --node-ip-address=10.0.0.2 --address=10.0.0.1:6379
# start worker node 2
ray start --node-ip-address=10.0.0.3 --address=10.0.0.1:6379
ray job submit --address="http://127.0.0.1:8265" \
--runtime-env-json='{"working_dir": "/openrlhf", "pip": "/openrlhf/requirements.txt"}' \
--no-wait \
-- python3 examples/train_ppo_ray.py \
... Stage 2 is independent on how you launch a ray cluster and you can launch multiple jobs to the same cluster. |
Thank you , for 1. Kuberay it would be great you can share the docker file you are using for 2 . setting up passwordless SSH has some issues on our cluster , is it stricly necessary for that , when you tried that method how did you go about it ? |
We have provided the vllm-based dockerfile https://github.com/OpenLLMAI/OpenRLHF/tree/main/dockerfile |
Hi Team,
It would be great if kuberay commands to run openrlhf is added in the docs ,to make the cold start easier to set it up
The text was updated successfully, but these errors were encountered: