New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Improve normalized advantage calculation #1642

Merged

emailweixu merged 2 commits into pytorch from PR_ppo_advantage_improvement

May 15, 2024

Commits on May 7, 2024

Improve normalized advantage calculation

1. Use LazyBatchNorm for advantage normalization.The motivation is that
for DDP training, the normalization statistics will combine the statistics from all GPUs.

2. Calculate normalized advantage in PPOAlgorithm.preprocess so that
the normalization is based on a much larger batch.

emailweixu committed May 7, 2024

Remove outdated unittest

emailweixu committed May 7, 2024
Configuration menu
View commit details

Copy full SHA for 5c6af18

Browse repository at this point
Copy the full SHA

5c6af18 View commit details

Browse the repository at this point in the history