Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

This repository contains code and analysis for the paper: Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning. Below is the framework of our proposed method.

Environment Setup

conda env create --file conda-recipe.yaml
pip install -r requirements.txt

Dataset Download

Run MCTS-DPO

Our main code include ./mcts_rl/algorithms/mcts and ./mcts_rl/trainers/tsrl_trainer.py

To run MCTS-DPO for MathQA on Mistral (SFT):

bash scripts/mcts_mathqa.sh

To run MCTS-DPO for CSR on Mistral (SFT):

bash scripts/mcts_csr.sh

Citation

@article{xie2024monte,
  title={Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning},
  author={Xie, Yuxi and Goyal, Anirudh and Zheng, Wenyue and Kan, Min-Yen and Lillicrap, Timothy P and Kawaguchi, Kenji and Shieh, Michael},
  journal={arXiv preprint arXiv:2405.00451},
  year={2024}
}

_{^{This repository is adapted from the code of the works Safe-RLHF.}}

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
mcts_rl		mcts_rl
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda-recipe.yaml		conda-recipe.yaml
framework-colorblindfriendly.jpg		framework-colorblindfriendly.jpg
requirements.txt		requirements.txt
visualize.ipynb		visualize.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

Environment Setup

Dataset Download

Run MCTS-DPO

Citation

About

Releases

Packages

Languages

License

YuxiXie/MCTS-DPO

Folders and files

Latest commit

History

Repository files navigation

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

Environment Setup

Dataset Download

Run MCTS-DPO

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages