-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit test Prioritised Experience Replay Memory #16
Comments
I am not sure whether what I am going to say is the correct logic behind PER or not. What current code does : In the training loop, when we do mem.append(), we are keeping the priority to be some default priority, transitions.max(). Shouldn't we do this? : Calculate the priority before appending, and append with that priority. This will keep the complexity same. And attach the priority to the sample right away. Such level of specification is not found in the paper, to the best of my knowledge. |
Adding new transitions with the max priority is in line 6 of the algorithm in the PER paper; the initial value, 1, is given in line 2. Also, calculating the priority means having access to the future states (even more states when calculating multi-step returns) and doing the whole target calculation on a single sample, so it's not that cheap. |
Just read that in the paper DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY from D. Horgan. But it's interesting to notice that they changed it cause this was not scaling well (this article is all about learning with a lot of different actors). |
@Kaixhin Yep, I understand that now when you say so about n-step TD. |
Results on 3 games so far look promising, so closing unless a specific problem is identified. |
PER was reported to cause issues (decreasing the performance of a DQN) when ported to another codebase. Although PER can cause performance to decrease, it is still likely that there exists a bug within it.
The text was updated successfully, but these errors were encountered: