You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to get the 'feudal' policy to work on the 'PongDeterministic-v4' environment but I had no luck. The 'lstm' policy seems to work for me, but If I change it to 'feudal' the episode rewards do not increase even after of 8 hours of training with 1 worker, they are stuck to -20, both on the 'master' branch and the 'dilated_fix' branch.
I saw the other issues mentioning that it doesn't achieve the benchmarks from the paper, but is it supposed to work on pong at least? or am I doing something wrong?
The text was updated successfully, but these errors were encountered:
I have been trying to get the 'feudal' policy to work on the 'PongDeterministic-v4' environment but I had no luck. The 'lstm' policy seems to work for me, but If I change it to 'feudal' the episode rewards do not increase even after of 8 hours of training with 1 worker, they are stuck to -20, both on the 'master' branch and the 'dilated_fix' branch.
I saw the other issues mentioning that it doesn't achieve the benchmarks from the paper, but is it supposed to work on pong at least? or am I doing something wrong?
The text was updated successfully, but these errors were encountered: