GitHub - Thecommonirin/CASPO: Step-level Confidence-aware Optimization for Large Reasoning Models

🔧 Quick Start

Installation

Our code is implemented based on OpenRLHF. Please follow OpenRLHF's guidance to configure required environments. Then run pip install -r requirements.txt

Reproduce the Project

For a training cycle, following the code below, then adjust the tempurature in 1., and start a new collect-train cycle.

# 1. collect 8K math data
bash sh/collect_data.sh
# 2. make VR pairs dataset for DPO
bash sh/make_vr_pairs.sh
# 3. train the dpo model
bash sh/train_dpo.sh
# adjust the tempurature in 1., then start a new collect-train cycle.

Evaluation of Math Reasoning

We used Qwen Math's codebase for evaluation (i.e., pass@1 accuracy).

bash sh/evaluate_all_bench.sh

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
evaluation		evaluation
openrlhf		openrlhf
outputs/models/Qwen2.5-7B		outputs/models/Qwen2.5-7B
sh		sh
.DS_Store		.DS_Store
README.md		README.md
READMEforOpenRLHF.md		READMEforOpenRLHF.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔧 Quick Start

Installation

Reproduce the Project

Evaluation of Math Reasoning

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔧 Quick Start

Installation

Reproduce the Project

Evaluation of Math Reasoning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages