Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have you considered using a PPO actor instead of a normal Actor-Critic? #2

Closed
outdoteth opened this issue Mar 6, 2021 · 1 comment

Comments

@outdoteth
Copy link

I think a lot of improvement could be made by using a PPO actor.

@danijar
Copy link
Owner

danijar commented Mar 6, 2021

PPO clips the advantage values so that it can safely train on on-policy data for multiple gradient steps. DreamerV2 uses a world model and thus can generate an unlimited amount of on-policy data without having to interact with the environment, so there is not much of a point in training on the same imagined trajectories multiple times.

@danijar danijar closed this as completed Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants