Prioritized Experience Replay Buffer Support by azrael417 · Pull Request #106 · NVIDIA/TorchFort

azrael417 · 2026-06-03T07:42:47Z

This MR adds a prioritized replay buffer, which allows the algorithms to assign weights to individual samples according to importance. This seems to be one important ingredient in modern RL training so I thought we might want to support it. It looks invasive but actually all it does is feeding some weights to the training algorithms and updating them accordingly. The interface is universal though, regular replay buffer still works and uses uniform weighting.

…to implement PER buffer Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

azrael417 · 2026-06-03T09:50:58Z

/build_and_test

github-actions · 2026-06-03T09:51:09Z

🚀 Build workflow triggered! View run

github-actions · 2026-06-03T09:58:57Z

❌ Build workflow failed! View run

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

azrael417 · 2026-06-03T10:44:02Z

/build_and_test

github-actions · 2026-06-03T10:44:10Z

🚀 Build workflow triggered! View run

github-actions · 2026-06-03T10:56:52Z

✅ Build workflow passed! View run

romerojosh · 2026-06-03T22:46:41Z

+      treeUpdate_(t, p_alpha);
+      priorities_[t] = raw_p;
+      max_priority_ = std::max(max_priority_, raw_p);
+      min_p_alpha_ = std::min(min_p_alpha_, p_alpha);


Codex doesn't like this handling of min_p_alpha_. In its words:

min_p_alpha_ is a running minimum that only decreases. Once a very low-priority transition is updated or later overwritten, all future IS weights can remain scaled by
that stale value, shrinking critic losses and gradients permanently. This should track the current minimum over active positive priorities, for example with a min-tree or recomputation on update/overwrite.

that seems correct. wow, this is very subtle, but SB and OpenAI use min trees too. Fixed in latest updates. It is simple to maintain since it runs in parallel to the sum tree.

romerojosh · 2026-06-03T22:47:58Z

+    return buffer_[index];
+  }
+
+  bool isReady() const override { return current_size_ >= min_size_; }


Another Codex complaint:

This can report ready before the PER tree has any positive-priority leaf when min_size < nstep. sample() then sees treeTotal_() == 0 and loops forever on zero-priority leaves.
isReady() should require at least one valid priority, or sample() should fail fast on total <= 0.

That sounds like a super exotic edge case but I agree, this should be fixed. Done now.

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

…unavailability of samples Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

romerojosh

LGTM!

azrael417 added 2 commits April 28, 2026 02:58

extending RB interface to emit weights and indices as well, allowing …

b634cef

…to implement PER buffer Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

implemented PER includinng tests and interface helpers

67baf8e

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

azrael417 requested a review from romerojosh June 3, 2026 07:42

azrael417 self-assigned this Jun 3, 2026

fixing lvalue issue

3f37ba7

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

azrael417 force-pushed the tkurth/rl-prioritized-replay-buffer branch from f0bad80 to 3f37ba7 Compare June 3, 2026 07:44

formatting

12eb806

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

azrael417 changed the title ~~Tkurth/rl prioritized replay buffer~~ Prioritized Experience Replay Buffer Support Jun 3, 2026

adding description of PER to docs

33fd897

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

azrael417 marked this pull request as draft June 3, 2026 10:15

Fixing PER tests and making PER n_env syntax compliant

608280f

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

azrael417 marked this pull request as ready for review June 3, 2026 10:43

romerojosh reviewed Jun 3, 2026

View reviewed changes

azrael417 added 3 commits June 3, 2026 22:11

fixing issues introduced by clang-format

529593a

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

adding mintree in lock step with sum tree and also guard against the …

d001f1c

…unavailability of samples Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

fixing clang formatting

6173dee

Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>

romerojosh approved these changes Jun 4, 2026

View reviewed changes

romerojosh merged commit d09e039 into master Jun 4, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritized Experience Replay Buffer Support#106

Prioritized Experience Replay Buffer Support#106
romerojosh merged 9 commits into
masterfrom
tkurth/rl-prioritized-replay-buffer

azrael417 commented Jun 3, 2026

Uh oh!

azrael417 commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

azrael417 commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

romerojosh Jun 3, 2026

Uh oh!

azrael417 Jun 4, 2026

Uh oh!

romerojosh Jun 3, 2026

Uh oh!

azrael417 Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

romerojosh left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

azrael417 commented Jun 3, 2026

Uh oh!

azrael417 commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

azrael417 commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

romerojosh Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

azrael417 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

romerojosh Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

azrael417 Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

romerojosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

azrael417 Jun 4, 2026 •

edited

Loading