About the implementation of advantage function in PPO Agent #14

yaorong1996 · 2023-03-13T08:42:22Z

I find that the implementation in PPOAgent from line 514 in grid/toy_grid_dag.py: adv = r + vsp * (1-d) - vs is only an implementation of the delta term in PPO raw paper. It's not the full term of the advantage function.

Was that a misunderstanding of your code or PPO?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the implementation of advantage function in PPO Agent #14

About the implementation of advantage function in PPO Agent #14

yaorong1996 commented Mar 13, 2023

About the implementation of advantage function in PPO Agent #14

About the implementation of advantage function in PPO Agent #14

Comments

yaorong1996 commented Mar 13, 2023