Gerçek DPO Fine-Tuning (İnce Ayar Projesi) (OSS Modellerle)

Bu proje, Direct Preference Optimization (DPO) ile açık kaynak bir dil modelini gerçekten eğitmek için uçtan uca çalışan bir kurulum sağlar. Kodlar simülasyon yapmaz; gerçek model, gerçek veri ve gerçek kayıplarla eğitilir. Varsayılan temel model: Qwen2.5-0.5B-Instruct (Apache-2.0).

Hızlı Başlangıç

Colab/Kaggle GPU açın.
pip install -r requirements.txt
Veri dönüştürme (gerçek veri):

python scripts/convert_hhrlhf_to_dpo.py --dataset_name Anthropic/hh-rlhf --split train --max_samples 20000 --output_jsonl data/hh_rlhf_train.jsonl
python scripts/convert_hhrlhf_to_dpo.py --dataset_name Anthropic/hh-rlhf --split test  --max_samples 2000  --output_jsonl data/hh_rlhf_eval.jsonl

Eğitim:

python train_dpo.py --base_model Qwen/Qwen2.5-0.5B-Instruct --train_jsonl data/hh_rlhf_train.jsonl --eval_jsonl data/hh_rlhf_eval.jsonl --output_dir models/qwen25-0_5b-dpo --learning_rate 1e-5 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 4 --max_steps 1000 --eval_steps 100 --save_steps 200 --lora_r 16 --lora_alpha 32 --lora_dropout 0.05 --bf16

Güncelleme :

python train_dpo.py \
  --base_model Qwen/Qwen2.5-0.5B-Instruct \
  --train_jsonl data/hh_rlhf_train.jsonl \
  --eval_jsonl  data/hh_rlhf_eval.jsonl \
  --output_dir  models/qwen25-0_5b-dpo \
  --learning_rate 1e-5 \
  --per_device_train_batch_size 1 \
  --per_device_eval_batch_size 1 \
  --gradient_accumulation_steps 16 \
  --max_steps 50 \
  --eval_steps 50 \
  --save_steps 50

Değerlendirme:

python evaluate_pairwise.py --base_model Qwen/Qwen2.5-0.5B-Instruct --adapter_path models/qwen25-0_5b-dpo --eval_jsonl data/hh_rlhf_eval.jsonl --max_eval_samples 500 --report_path outputs/eval_report.json

Güncelleme :

python -u evaluate_pairwise.py \
  --base_model Qwen/Qwen2.5-0.5B-Instruct \
  --adapter_path models/qwen25-0_5b-dpo \
  --eval_jsonl data/hh_rlhf_eval.jsonl \
  --max_eval_samples 500 \
  --report_path outputs/eval_report.json

İnferans:

python infer_chat.py --base_model Qwen/Qwen2.5-0.5B-Instruct --adapter_path models/qwen25-0_5b-dpo

Notlar

4-bit QLoRA GPU var ise otomatik devreye girer. CPU'da da çalışır (yavaş).
Alternatif temel modeller: TinyLlama/TinyLlama-1.1B-Chat-v1.0, Qwen/Qwen2.5-1.5B-Instruct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gerçek DPO Fine-Tuning (İnce Ayar Projesi) (OSS Modellerle)

Hızlı Başlangıç

Notlar

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
scripts		scripts
README.md		README.md
evaluate_pairwise.py		evaluate_pairwise.py
infer_chat.py		infer_chat.py
requirements.txt		requirements.txt
train_dpo.py		train_dpo.py

Folders and files

Latest commit

History

Repository files navigation

Gerçek DPO Fine-Tuning (İnce Ayar Projesi) (OSS Modellerle)

Hızlı Başlangıç

Notlar

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages