Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
-
Updated
Feb 19, 2025 - Python
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
A comprehensive collection of process reward models.
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
This repository contains the official implementation of Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision.
[KDD 2025] Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
This repository includes code and materials for the paper "Training Step-Level Reasoning Verifiers with Formal Verification Tools".
Add a description, image, and links to the process-reward-model topic page so that developers can more easily learn about it.
To associate your repository with the process-reward-model topic, visit your repo's landing page and select "manage topics."