Random sampling in tools::sample_episodes #42

dirkmcpherson · 2023-10-31T15:55:17Z

Hi,

I was going over your dataset code and I noticed that you're sampling from the episode buffer randomly. Generally this is correct because subsequent episodes will be strongly correlated, but your sampling technique picks a random episode at each step rather than guaranteeing every episode is seen before any episode is seen twice.

It's probably not a big deal since you'll sample uniformly on average, but I was wondering if you had a reason to make this implementation choice?

Thanks again for writing this repo.

NM512 · 2023-11-04T00:40:26Z

Hi,

Thank you for your question.

In the context of off-policy reinforcement learning, it's a common practice to stochastically sample steps from the replay buffer. As a reference, implementation in original DreamerV3 uses a similar approach by randomly selecting "chunks" of successive 1024 steps and then sampling sequences from those chunks.

In my repository, I save data on an episode-by-episode basis within the replay buffer. This choice was made to facilitate handling of individual episode data, making it easier to work with.

I hope this clarifies the implementation choice. If you have any further questions, please feel free to share them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random sampling in tools::sample_episodes #42

Random sampling in tools::sample_episodes #42

dirkmcpherson commented Oct 31, 2023

NM512 commented Nov 4, 2023

Random sampling in tools::sample_episodes #42

Random sampling in tools::sample_episodes #42

Comments

dirkmcpherson commented Oct 31, 2023

NM512 commented Nov 4, 2023