microGRPO

A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper¹²³.

🆕 microGRPO now implements the GRPO improvements introduced by Dr. GRPO⁴, Apple's LOOP⁵, and Mistral's Magistral⁶:

💥 Remove per-group advantage normalization⁴
⛳️ Leave-one-out advantage⁵ (LOOP only)
🔥 Eliminate KL divergence⁵
🎢 Normalize loss⁵
🏆 Add per-batch advantage normalization⁶ (Magistral only)
🚦 Relax trust region bounds⁵
🌈 Eliminate non-diverse groups⁵

Features

🐭 Only ~300 lines of code
📦 In pure NumPy, with autograd to compute the gradient
✅ Type annotated and linted
✂️ Easily swap out the default game and train on any other game or environment

Getting started

Note

You'll need to install uv to run the commands below.

To start teaching a policy to play a simplified version of Battleship, run:

uv run microgrpo.py

You should see that the policy learns to improve its average score from around 15% to about 50% over 2000 iterations:

Background

File structure

The file is structured into five sections:

🕹️ Game (~50 lines): An implementation of the Battleship board game
🌍 Environment (~60 lines): The API with which an agent can interact with the game
🧠 Policy (~30 lines): A model that produces action probabilities given the observed environment state
🎯 GRPO (~80 lines): The GRPO objective function and training data generator
⚡ Train (~50 lines): The loop that collects training data and optimizes the GRPO objective with AdamW

GRPO config

Starting a training run only requires defining a GRPOConfig with your choice of environment (here, BattleshipEnv) and a function that evaluates the policy model given its parameters (here, neural_battleship_policy):

# Define the environment and the policy model to optimize.
grpo_config = GRPOConfig(environment=BattleshipEnv, policy=neural_battleship_policy)

# Train the policy model by maximizing the GRPO objective with AdamW.
θ_star, rewards_val = train_grpo(θ_init := neural_battleship_policy_init(), grpo_config)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
microgrpo.py		microgrpo.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

microGRPO

Features

Getting started

Background

File structure

GRPO config

About

Uh oh!

Releases

Languages

License

superlinear-ai/microGRPO

Folders and files

Latest commit

History

Repository files navigation

microGRPO

Features

Getting started

Background

File structure

GRPO config

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages