Prioritized replay buffer #175

josephdviviano · 2024-03-30T23:06:28Z

I've adde a prioritized replay buffer. This:

Only adds examples if their reward is larger than the min reward found in the buffer.
Only adds examples if they are unique (by default).

In general, uniqueness is defined as distances between the candidate batch and buffer states using a p_norm -- this is configurable by the user. The default settings use a p-norm of 1 and a distance threshold of 0, i.e., the added states should not be identical to any state already in the buffer.

Important you can test this using

tutorials/examples/train_hypergrid.py --replay_buffer_size 1000 --replay_buffer_prioritized

Note: currently, the standard buffer outperforms the prioritized buffer using these default settings!

Standard: 'loss': 9.107343066716567e-05, 'states_visited': 998416, 'l1_dist': 0.00023296871222555637, 'logZ_diff': 0.001130819320678711
Prioritized: 'loss': 0.0003514138516038656, 'states_visited': 998416, 'l1_dist': 0.00017267849761992693, 'logZ_diff': 0.0020639896392822266

In the debugger, I could determine that no samples were ever added to the buffer after it was originally filled, because the states were not found to be unique. I.e., in replay_buffer.py the following logic always had idx_batch_buffer as completely full of False:

    # Remove non-diverse examples according to the above distances.
    idx_batch_batch = batch_batch_dist > self.cutoff_distance
    idx_batch_buffer = batch_buffer_dist > self.cutoff_distance
    idx_diverse = idx_batch_batch & idx_batch_buffer

@saleml I'd be curious to get your opinion on this. Perhaps we can tweak the implementation of the prioritized replay buffer, or perhaps this should be expected behaviour for this relatively simple example. I am not sure.

…ory contains log_rewards and the internal trajectory is None (this can happen with empty initalized trajectory)

saleml · 2024-04-02T10:14:14Z

src/gfn/containers/trajectories.py

        if self._log_rewards is not None and other._log_rewards is not None:
            self._log_rewards = torch.cat(
                (self._log_rewards, other._log_rewards),
                dim=0,
            )
+        # If the trajectories object does not yet have `log_rewards` assigned but the
+        # external trajectory has log_rewards, simply assign them over.
+        elif self._log_rewards is None and other._log_rewards is not None:


why is this needed?
This is actually dangerous and can easily lead to undesired behavior.

We had a situation in the buffer where the empty initialized trajectory had _log_rewards = None, so any call to .extend() did not update the _log_rewards - we can handle this a few ways but, as is, thankfully tests are passing.

saleml · 2024-04-02T10:15:07Z

src/gfn/containers/trajectories.py

+        # Ensure log_probs/rewards are the correct dimensions. TODO: Remove?
+        if self.log_probs.numel() > 0:
+            assert self.log_probs.shape == self.actions.batch_shape
+
+        if self.log_rewards is not None:
+            assert len(self.log_rewards) == self.actions.batch_shape[-1]
+


yes good to check -- but maybe ideal that we don't have to check -- this adds overhead.

saleml · 2024-04-02T10:22:30Z

src/gfn/containers/replay_buffer.py

+        to_add = len(training_objects)
+
+        self._is_full |= self._index + to_add >= self.capacity
+        self._index = (self._index + to_add) % self.capacity


But if we don't add all the training, we would have increased self._index by more than needed.

Acutally, do we need self._index at all in ReplayBuffers?

It isn't clear to me what this is used for actually. We can chat about it on our meeting.

…extend always works, which means we no longer need the extra condition (but we are leaving in the sanity check for now).

josephdviviano · 2024-04-02T21:43:50Z

@saleml Just need your official sign off on this before I can merge :)

saleml

LGTM! Great PR

josephdviviano added 3 commits March 30, 2024 15:54

log_rewards are stored properly in the case that the external traject…

e087f41

…ory contains log_rewards and the internal trajectory is None (this can happen with empty initalized trajectory)

can use either standard or prioritized replay buffer

61f7fd2

added a prioritized replay buffer

75e3198

josephdviviano requested a review from saleml March 30, 2024 23:06

josephdviviano self-assigned this Mar 30, 2024

bugfix on assert

7449a92

saleml reviewed Apr 2, 2024

View reviewed changes

Base automatically changed from fix_off_policy to master April 2, 2024 14:57

josephdviviano added 6 commits April 2, 2024 12:05

removed self._index

184c5f5

trajectories and transitions now initalize log_rewards correctly, so …

00cab17

…extend always works, which means we no longer need the extra condition (but we are leaving in the sanity check for now).

small efficiency improvements to prioritized replay buffer

a670356

changes to default settings for hypergrid

213653c

reworking of prioritized replay buffer logic

04e50e9

added comment

5472055

saleml approved these changes Apr 3, 2024

View reviewed changes

saleml merged commit 4387e5b into master Apr 3, 2024
3 checks passed

josephdviviano deleted the prioritized_replay_buffer branch April 5, 2024 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritized replay buffer #175

Prioritized replay buffer #175

josephdviviano commented Mar 30, 2024

saleml Apr 2, 2024

josephdviviano Apr 2, 2024

saleml Apr 2, 2024

josephdviviano Apr 2, 2024

saleml Apr 2, 2024

josephdviviano Apr 2, 2024

josephdviviano commented Apr 2, 2024

saleml left a comment

Prioritized replay buffer #175

Prioritized replay buffer #175

Conversation

josephdviviano commented Mar 30, 2024

saleml Apr 2, 2024

Choose a reason for hiding this comment

josephdviviano Apr 2, 2024

Choose a reason for hiding this comment

saleml Apr 2, 2024

Choose a reason for hiding this comment

josephdviviano Apr 2, 2024

Choose a reason for hiding this comment

saleml Apr 2, 2024

Choose a reason for hiding this comment

josephdviviano Apr 2, 2024

Choose a reason for hiding this comment

josephdviviano commented Apr 2, 2024

saleml left a comment

Choose a reason for hiding this comment