Skip to content

RL: first round of some fixes and improvements#103

Merged
azrael417 merged 1 commit into
NVIDIA:masterfrom
azrael417:tkurth/rl-improvements-and-fixes
Apr 23, 2026
Merged

RL: first round of some fixes and improvements#103
azrael417 merged 1 commit into
NVIDIA:masterfrom
azrael417:tkurth/rl-improvements-and-fixes

Conversation

@azrael417
Copy link
Copy Markdown
Collaborator

@azrael417 azrael417 commented Apr 20, 2026

This MR adds a few fixes and improvements to the RL code in the branch. Most notably:

  • I somehow misunderstood the real use of target vs active networks. The latter are the important ones, the former are just for stabilization and should never be used externally, i.e. during inference. I fixed this throughout the code. This can be considered a critical fix and should improve skill drastically.
  • In SAC, target_entropy was not correctly wired up in the interface and always ignored
  • Fixed some warnings/errors.
  • added comments for practitioners about useful parameter choice.
  • added missing DDP all-reduces for loss reporting in some algorithms. This is not a problem for training, just a reporting problem.
  • In DDPG and SAC, the polka update was not tuned to the optimizer updates when using gradient accumulation. This was fixed. For the other algorithms this was already correct.

@azrael417 azrael417 requested a review from romerojosh April 20, 2026 07:52
@azrael417 azrael417 self-assigned this Apr 20, 2026
@azrael417 azrael417 changed the title first round of some fixes and improvements RL: first round of some fixes and improvements Apr 20, 2026
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
@azrael417 azrael417 force-pushed the tkurth/rl-improvements-and-fixes branch from ea33e12 to e2f2fb8 Compare April 20, 2026 08:41
@azrael417 azrael417 marked this pull request as ready for review April 20, 2026 09:19
@azrael417
Copy link
Copy Markdown
Collaborator Author

/build_and_test

@github-actions
Copy link
Copy Markdown

🚀 Build workflow triggered! View run

@github-actions
Copy link
Copy Markdown

✅ Build workflow passed! View run

Copy link
Copy Markdown
Collaborator

@romerojosh romerojosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@azrael417 azrael417 merged commit 4d29c77 into NVIDIA:master Apr 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants