Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

DARWIN is a decode-time alignment technique that uses a reward-guided tree search framework to align the LLM and achieve comparable performance to preference optimization on 2 instruction following benchmarks.

Paper Link: https://arxiv.org/abs/2406.15193

How to use?

To run darwin, check out the demo notebook. You can run darwin with just a few lines of code!

To run evaluation on alpaca eval benchmark, you can use the following command

python3 alpaca_generate.py --method='darwin'  --model_name='meta-llama/Meta-Llama-3-8B-Instruct' --range='0-805' --replacement_period=40 --iteration=3 --n_mutation=1

The results will be saved in a json file where the 'past_outputs' contains a list of outputs for original output and mutation cycle 1, 2, 3. Please format the output into the alpaca_eval format from https://github.com/tatsu-lab/alpaca_eval

Main results

Overview of Darwin algorithm

Citation

If you use Darwin in your publication, please cite it by using the following BibTeX entry.

      title={Reward Steering with Evolutionary Heuristics for Decoding-time Alignment}, 
      author={Chia-Yu Hung and Navonil Majumder and Ambuj Mehrish and Soujanya Poria},
      year={2024},
      eprint={2406.15193},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.15193}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
alpaca_generate.py		alpaca_generate.py
demo.ipynb		demo.ipynb
models.py		models.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

How to use?

Main results

Overview of Darwin algorithm

Citation

About

Releases

Packages

Contributors 2

Languages

declare-lab/darwin

Folders and files

Latest commit

History

Repository files navigation

Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

How to use?

Main results

Overview of Darwin algorithm

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages