You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, danijar! First of all, thanks for your work :)
I've been trying out dreamerv2 this past week and tried to reproduce riverraid's results. However, I was unsuccessful and the agent only reaches about ~5k reward after almost 1e6 train steps. This is the latest result I got. If you need, I can attach tensorboard graphs later this week.
I did a small modification to the original code so it runs on multiple GPUs (tf.distribute.MirroredStrategy). Then, I trained the agent to play Pong and the return plot was similar to the one you posted on #8, so I figured out it was ok. Also, in the riverraid's output attached above, half of it ran with precision=16 and half with precision=32 since it was mentioned in a few other issues that precision 32 helped, especially #30. I did not did a full run with precision=32, though.
Do you have any tips on what could be going wrong or what could I do to debug it?
Thanks so much!
The text was updated successfully, but these errors were encountered:
What training curves are you getting? It's easy to make mistakes with tf.distribute. For debugging, I recommend to run a few seeds on a single GPU with the original code from the repository here.
Yeah, I'm pretty sure it is some problem with my tf.distribute implementation. I ran the original code and it got to higher scores than I was getting much sooner. Thanks for the insight!!
Hello, danijar! First of all, thanks for your work :)
I've been trying out dreamerv2 this past week and tried to reproduce riverraid's results. However, I was unsuccessful and the agent only reaches about ~5k reward after almost 1e6 train steps. This is the latest result I got. If you need, I can attach tensorboard graphs later this week.
I did a small modification to the original code so it runs on multiple GPUs (
tf.distribute.MirroredStrategy
). Then, I trained the agent to play Pong and the return plot was similar to the one you posted on #8, so I figured out it was ok. Also, in the riverraid's output attached above, half of it ran with precision=16 and half with precision=32 since it was mentioned in a few other issues that precision 32 helped, especially #30. I did not did a full run with precision=32, though.Do you have any tips on what could be going wrong or what could I do to debug it?
Thanks so much!
The text was updated successfully, but these errors were encountered: