You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Maxim,
first of all, thank you so much for the book! It helps me a lot for my thesis!
Second, I think that the ExperienceReplayBuffer stores the second-last transition twice, which could bias the training if an environment only has a few steps (like mine).
Maybe I have overlooked something, but this is my minimal example showing the described behaviour:
Notice how the transition from state 2 to 3 is stored twice each time. I used the ptan version that you can get via pip today. Could you have a look into this?
Best regards!
The text was updated successfully, but these errors were encountered:
Hi Maxim,
first of all, thank you so much for the book! It helps me a lot for my thesis!
Second, I think that the ExperienceReplayBuffer stores the second-last transition twice, which could bias the training if an environment only has a few steps (like mine).
Maybe I have overlooked something, but this is my minimal example showing the described behaviour:
The output is:
Notice how the transition from state 2 to 3 is stored twice each time. I used the ptan version that you can get via pip today. Could you have a look into this?
Best regards!
The text was updated successfully, but these errors were encountered: