Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support A3C parallel in multiple GPUs #282

Merged
merged 8 commits into from
Nov 3, 2023

Conversation

Gaiejj
Copy link
Member

@Gaiejj Gaiejj commented Oct 19, 2023

Description

Update torch.distributed related files to support A3C training in multiple GPUs.

We support two methods to launch your training script on specific GPUs

  • Begin with initial specific GPU
    You can run your task beginning with a specific GPU. e.g. If you want to run PPOLag with parallel=4, you can run:
python train_policy.py --algo PPOLag --parallel 4 --device cuda:0

then the task would run on GPUs cuda:0, cuda:1, cuda:2, cuda:3.
If you run:

python train_policy.py --algo PPOLag --parallel 4 --device cuda:2

then the task would run on GPUs cuda:2, cuda:3, cuda:4, cuda:5.

  • Begin with specific GPUs
    You can also run your task on specific GPUs. e.g. If you want to run PPOLag with parallel=4, in GPUs 3,4,6,7 you can run:
export CUDA_VISIBLE_DEVICES=3,4,6,7
python train_policy.py --algo PPOLag --parallel 4 --device cuda:0

with nvitop, you can observe that device 3,4,6,7 is acativated.
image

WARNING
Please make sure the number of your GPUs available is bigger than parallel.

Motivation and Context

Resolve #281

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format. (required)
  • I have checked the code using make lint. (required)
  • I have ensured make test pass. (required)

@Gaiejj Gaiejj added enhancement New feature or request feature Something related to new features labels Oct 19, 2023
@codecov
Copy link

codecov bot commented Oct 19, 2023

Codecov Report

Merging #282 (ff4f620) into main (9d943b6) will decrease coverage by 0.11%.
The diff coverage is 69.44%.

❗ Current head ff4f620 differs from pull request most recent head 3fb000c. Consider uploading reports for the commit 3fb000c to get more accurate results

@@            Coverage Diff             @@
##             main     #282      +/-   ##
==========================================
- Coverage   97.00%   96.89%   -0.11%     
==========================================
  Files         138      138              
  Lines        6991     7000       +9     
==========================================
+ Hits         6781     6782       +1     
- Misses        210      218       +8     
Files Coverage Δ
omnisafe/algorithms/offline/vae_bc.py 100.00% <100.00%> (ø)
...isafe/algorithms/on_policy/base/policy_gradient.py 100.00% <100.00%> (ø)
omnisafe/algorithms/on_policy/base/trpo.py 92.96% <100.00%> (ø)
omnisafe/algorithms/on_policy/first_order/cup.py 95.31% <100.00%> (ø)
...mnisafe/algorithms/on_policy/first_order/focops.py 95.08% <100.00%> (ø)
omnisafe/algorithms/on_policy/second_order/cpo.py 93.51% <100.00%> (+0.65%) ⬆️
omnisafe/algorithms/on_policy/second_order/pcpo.py 100.00% <100.00%> (ø)
omnisafe/common/logger.py 97.50% <100.00%> (ø)
omnisafe/algorithms/algo_wrapper.py 91.92% <37.50%> (-2.87%) ⬇️
omnisafe/utils/distributed.py 87.50% <25.00%> (-5.36%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Gaiejj Gaiejj mentioned this pull request Oct 19, 2023
3 tasks
@zmsn-2077 zmsn-2077 merged commit d55958a into PKU-Alignment:main Nov 3, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature Something related to new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question]
2 participants