feat: align with oat's multi-turn ppo apis by lkevinzc · Pull Request #85 · axon-rl/gem

lkevinzc · 2025-09-12T08:15:09Z

Oat has a dedicated support for multi-turn PPO for general agentic RL use cases (sail-sg/oat#63). We now attempt to follow these new APIs for future developments.

Note that turn-level critic learning is natively supported by Oat's new PR, so we can enable this simply by --critic_type ppo. Also note that the original "Return Batch Normalization" is now replaced by "Advantage Whitening" on Oat's side to make it more general, e.g., PPO's GAE can also go through such normalization.

lkevinzc · 2025-09-15T09:33:47Z

Result on game:Sudoku-v0-easy:

support oat's multi turn ppo apis

8bd716f

lkevinzc mentioned this pull request Sep 15, 2025

feat: support turn-level ppo for general agentic rl sail-sg/oat#63

Merged

lkevinzc merged commit 6bb6540 into main Sep 15, 2025

lkevinzc deleted the oat-ppo branch September 15, 2025 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: align with oat's multi-turn ppo apis#85

feat: align with oat's multi-turn ppo apis#85
lkevinzc merged 1 commit intomainfrom
oat-ppo

lkevinzc commented Sep 12, 2025

Uh oh!

lkevinzc commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lkevinzc commented Sep 12, 2025

Uh oh!

lkevinzc commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant