Skip to content

feat: align with oat's multi-turn ppo apis#85

Merged
lkevinzc merged 1 commit intomainfrom
oat-ppo
Sep 15, 2025
Merged

feat: align with oat's multi-turn ppo apis#85
lkevinzc merged 1 commit intomainfrom
oat-ppo

Conversation

@lkevinzc
Copy link
Contributor

Oat has a dedicated support for multi-turn PPO for general agentic RL use cases (sail-sg/oat#63). We now attempt to follow these new APIs for future developments.

Note that turn-level critic learning is natively supported by Oat's new PR, so we can enable this simply by --critic_type ppo. Also note that the original "Return Batch Normalization" is now replaced by "Advantage Whitening" on Oat's side to make it more general, e.g., PPO's GAE can also go through such normalization.

@lkevinzc
Copy link
Contributor Author

Result on game:Sudoku-v0-easy:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant