Skip to content

DreamGenX/DreamGenTrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DreamGen fine-tuning scripts

Super simple scripts for supervised fine-tuning and preference-tuning, mostly based on Unsloth. Fork it, and do whatever you want with it.

Installation

  • Install Unsloth.
  • Install other dependencies pip install fire.

dpo.py

Tips

Experiment with different learning rate, beta and DPO / IPO. The actual best value will be highly data dependent. Start with learning rate that's approximately 10-times smaller than the learning rate you normally use for QLoRA SFT.

Monitor (train|eval)/rewards/accuracies, (train|eval)/rewards/margins and train/loss. They should not improve too fast, if they do, lower learning rate.

Never trust just your (train|eval)/rewards metrics alone, perform end-to-end testing on a dataset that does not overlap with your SFT and DPO data.

Here are some example DPO DreamGen V1 7B runs, using different learning rates:

rewards train-loss learning-rate

Here are some example DPO Bagel runs:

Useful DPO reading material

End-to-end evals

End-to-end evals on data that's disjoint from your SFT and DPO training data are crucial to assess real improvements. Ideally, your evals should be as close to what you intend to use the final model for. If you don't have that, you can use one of the existing broad-spectrum auto-evals: