DreamGen fine-tuning scripts

Super simple scripts for supervised fine-tuning and preference-tuning, mostly based on Unsloth. Fork it, and do whatever you want with it.

Installation

Install Unsloth.
Install other dependencies pip install fire.

dpo.py

Tips

Experiment with different learning rate, beta and DPO / IPO. The actual best value will be highly data dependent. Start with learning rate that's approximately 10-times smaller than the learning rate you normally use for QLoRA SFT.

Monitor (train|eval)/rewards/accuracies, (train|eval)/rewards/margins and train/loss. They should not improve too fast, if they do, lower learning rate.

Never trust just your (train|eval)/rewards metrics alone, perform end-to-end testing on a dataset that does not overlap with your SFT and DPO data.

Here are some example DPO DreamGen V1 7B runs, using different learning rates:

Here are some example DPO Bagel runs:

Useful DPO reading material

End-to-end evals

End-to-end evals on data that's disjoint from your SFT and DPO training data are crucial to assess real improvements. Ideally, your evals should be as close to what you intend to use the final model for. If you don't have that, you can use one of the existing broad-spectrum auto-evals:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
LICENSE		LICENSE
README.md		README.md
dpo-run.sh		dpo-run.sh
dpo.py		dpo.py
merge.py		merge.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamGen fine-tuning scripts

Installation

dpo.py

Tips

Useful DPO reading material

End-to-end evals

About

Releases

Packages

Languages

License

DreamGenX/DreamGenTrain

Folders and files

Latest commit

History

Repository files navigation

DreamGen fine-tuning scripts

Installation

dpo.py

Tips

Useful DPO reading material

End-to-end evals

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages