Prioritized Experience Replay Buffer Support#106
Conversation
…to implement PER buffer Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
f0bad80 to
3f37ba7
Compare
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
|
/build_and_test |
|
🚀 Build workflow triggered! View run |
|
❌ Build workflow failed! View run |
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
|
/build_and_test |
|
🚀 Build workflow triggered! View run |
|
✅ Build workflow passed! View run |
| treeUpdate_(t, p_alpha); | ||
| priorities_[t] = raw_p; | ||
| max_priority_ = std::max(max_priority_, raw_p); | ||
| min_p_alpha_ = std::min(min_p_alpha_, p_alpha); |
There was a problem hiding this comment.
Codex doesn't like this handling of min_p_alpha_. In its words:
min_p_alpha_is a running minimum that only decreases. Once a very low-priority transition is updated or later overwritten, all future IS weights can remain scaled by
that stale value, shrinking critic losses and gradients permanently. This should track the current minimum over active positive priorities, for example with a min-tree or recomputation on update/overwrite.
There was a problem hiding this comment.
that seems correct. wow, this is very subtle, but SB and OpenAI use min trees too. Fixed in latest updates. It is simple to maintain since it runs in parallel to the sum tree.
| return buffer_[index]; | ||
| } | ||
|
|
||
| bool isReady() const override { return current_size_ >= min_size_; } |
There was a problem hiding this comment.
Another Codex complaint:
This can report ready before the PER tree has any positive-priority leaf when min_size < nstep. sample() then sees treeTotal_() == 0 and loops forever on zero-priority leaves.
isReady() should require at least one valid priority, or sample() should fail fast on total <= 0.
There was a problem hiding this comment.
That sounds like a super exotic edge case but I agree, this should be fixed. Done now.
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
…unavailability of samples Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
Signed-off-by: Thorsten Kurth <tkurth@nvidia.com>
This MR adds a prioritized replay buffer, which allows the algorithms to assign weights to individual samples according to importance. This seems to be one important ingredient in modern RL training so I thought we might want to support it. It looks invasive but actually all it does is feeding some weights to the training algorithms and updating them accordingly. The interface is universal though, regular replay buffer still works and uses uniform weighting.