Process Reward Learning

The code implementation of Process Reward Learning (PRL).

Data Preparation

python examples/data_preprocess/numina_math.py
python examples/data_preprocess/math500.py

bash runs/run_grpo.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
eval		eval
examples		examples
runs		runs
scripts		scripts
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py