Skip to content

KyujinHan/Sakura-SOLAR-DPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sakura-SOLAR-DPO

(주)미디어그룹사람과숲과 (주)마커의 LLM 연구 컨소시엄에서 개발된 모델입니다

Sakura-SOLAR project;
I noted that all most of things about the Sakura-SOLAR models which is the global LLM Rank 1 on December, 2023.
I hope, the open-source more and more develop!😄😄

Contents

(Quick) Model lists

Introduction

  • I created the 🌸kyujinpy/Sakura-SOLAR-Instruct LLM, which is Open LLM Rank 1.
  • I love open-source, I want to share everything about the model that won first rank.
  • I hope this GitHub helps a lot of people.😎😎

News

  • 2023.12.28
    • Rank1 (Open LLM leaderboard): 🌸kyujinpy/Sakura-SOLAR-Instruct

Model Performance

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
🌸kyujinpy/Sakura-SOLAR-Instruct 74.40 70.99 88.42 66.33 71.79 83.66 65.20
🌸🐋kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v2 74.17 71.25 88.52 66.13 72.16 83.03 63.91
🌸kyujinpy/Sakura-SOLAR-Instruct-DPO-v2 74.14 70.90 88.41 66.48 71.86 83.43 63.76
🌸🐋kyujinpy/Sakura-SOLRCA-Math-Instruct-DPO-v1 74.13 71.25 88.48 66.21 72.12 82.87 63.84
🌸🐋kyujinpy/Sakura-SOLRCA-Instruct-DPO 74.05 71.16 88.49 66.17 72.10 82.95 63.46
SOLAR-10.7B-Instruct-v1.0 74.20 71.08 88.16 66.21 71.43 83.58 64.75
Mixtral-8x7B-Instruct-v0.1 72.62 70.22 87.63 71.16 64.58 81.37 60.73

Follow up as link.

Training code

1. Merge

  1. First, donwload mergekit.
  2. Implement below command for merge.
# Example)
mergekit-yaml ./config.yml ./Sakura-SOLAR [--cuda]

2. DPO

  1. Implement below code for dpo.
# Example)
python DPO.py \
    --base_model kyujinpy/Sakura-SOLAR-Instruct \
    --data-path  kyujinpy/orca_math_dpo \
    --output_dir [...output_dir...] \
    --num_epochs [...epoch...] \
    --batch_size [...batch_size...] \
    --micro_batch_size [...micro_batch...] \
    --learning_rate [...learning_rate...] \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules [...target_modules...] \
    --lr_scheduler 'linear' \
    --warmup_ratio 0.1 \
    --cutoff_len 4096 \
  1. Merge: model + LoRA layer
python merge.py \
    --base_model_name_or_path kyujinpy/Sakura-SOLAR-Instruct \
    --peft_model_path [...output_dir...] \
    --output_dir [...output_final_dir...]

Hyperparameters & Prompt

  • 😎kyujinpy/Sakura-SOLAR-Instruct
slices:
  - sources:
      - model: VAGOsolutions/SauerkrautLM-SOLAR-Instruct
        layer_range: [0, 48]
      - model: upstage/SOLAR-10.7B-Instruct-v1.0
        layer_range: [0, 48]
        
merge_method: slerp
base_model: upstage/SOLAR-10.7B-Instruct-v1.0

parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
tokenizer_source: union
    
dtype: float16

  • 😎kyujinpy/Sakura-SOLAR-Instruct-DPO-v1
Hyperparameter kyujinpy/Sakura-SOLAR-Instruct-DPO-v1
LoRA method LoRA
load_in_8bit True
learning rate 1e-6
batch size 32
micro batch size 2
warmup ratio 0.1
epochs 1
weight decay 0.
lr scheduler linear
lora alpha 16
lora rank 16
lora dropout 0.05
beta 0.1
optim adamw_torch
bf16 True
lora target modules embed_tokens, q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
cutoff length 4096
Datasets argilla/distilabel-math-preference-dpo
Base Model kyujinpy/Sakura-SOLAR-Instruct
### User:

### Assistant: 

Prompting


  • 😎kyujinpy/Sakura-SOLAR-Instruct-DPO-v2
Hyperparameter kyujinpy/Sakura-SOLAR-Instruct-DPO-v2
LoRA method LoRA
load_in_8bit True
learning rate 1e-5
batch size 32
micro batch size 2
warmup ratio 0.1
epochs 1
weight decay 0.
lr scheduler linear
lora alpha 16
lora rank 16
lora dropout 0.05
beta 0.1
optim paged_adamw_32bit
bf16 True
lora target modules embed_tokens, q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
cutoff length 4096
Datasets argilla/distilabel-math-preference-dpo
Base Model kyujinpy/Sakura-SOLAR-Instruct
### User:

### Assistant:

  • 😎kyujinpy/Sakura-SOLRCA-Instruct-Dpo
Hyperparameter kyujinpy/Sakura-SOLRCA-Instruct-Dpo
LoRA method LoRA
load_in_8bit True
learning rate 5e-7
batch size 32
micro batch size 1
warmup ratio 0.1
epochs 1
weight decay 0.
lr scheduler linear
lora alpha 16
lora rank 16
lora dropout 0.05
beta 0.1
optim paged_adamw_32bit
bf16 True
lora target modules embed_tokens, q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
cutoff length 4096
Datasets Intel/orca_dpo_pairs
Base Model kyujinpy/Sakura-SOLAR-Instruct
### User:

### Assistant:

  • 😎kyujinpy/Sakura-SOLRCA-Math-Instruct-Dpo-v1
Hyperparameter kyujinpy/Sakura-SOLRCA-Math-Instruct-Dpo-v1
LoRA method LoRA
load_in_8bit True
learning rate 5e-7
batch size 32
micro batch size 2
warmup ratio 0.1
epochs 1
weight decay 0.
lr scheduler linear
lora alpha 16
lora rank 16
lora dropout 0.05
beta 0.1
optim paged_adamw_32bit
bf16 True
lora target modules embed_tokens, q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
cutoff length 4096
Datasets kyujinpy/orca_math_dpo
Base Model kyujinpy/Sakura-SOLAR-Instruct
### User:

### Assistant:

  • 😎kyujinpy/Sakura-SOLRCA-Math-Instruct-Dpo-v2
Hyperparameter kyujinpy/Sakura-SOLRCA-Math-Instruct-Dpo-v2
LoRA method LoRA
load_in_8bit True
learning rate 5e-7
batch size 32
micro batch size 2
warmup ratio 0.1
epochs 1
weight decay 0.
lr scheduler linear
lora alpha 16
lora rank 16
lora dropout 0.05
beta 0.1
optim paged_adamw_32bit
bf16 True
lora target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
cutoff length 4096
Datasets kyujinpy/orca_math_dpo
Base Model kyujinpy/Sakura-SOLAR-Instruct
### User:

### Assistant:

Prompting

TODO

  • Share code
  • Share hyperparameters
  • Share datasets

References

About

Sakura-SOLAR-DPO: Merge, SFT, and DPO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages