Have you considered using a PPO actor instead of a normal Actor-Critic? #2

outdoteth · 2021-03-06T20:13:32Z

I think a lot of improvement could be made by using a PPO actor.

danijar · 2021-03-06T22:01:39Z

PPO clips the advantage values so that it can safely train on on-policy data for multiple gradient steps. DreamerV2 uses a world model and thus can generate an unlimited amount of on-policy data without having to interact with the environment, so there is not much of a point in training on the same imagined trajectories multiple times.

danijar closed this as completed Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have you considered using a PPO actor instead of a normal Actor-Critic? #2

Have you considered using a PPO actor instead of a normal Actor-Critic? #2

outdoteth commented Mar 6, 2021

danijar commented Mar 6, 2021

Have you considered using a PPO actor instead of a normal Actor-Critic? #2

Have you considered using a PPO actor instead of a normal Actor-Critic? #2

Comments

outdoteth commented Mar 6, 2021

danijar commented Mar 6, 2021