[Baseline] LLaMA2-7B RLHF training curves #263

hijkzzz · 2024-04-09T03:59:22Z

deepspeed ./train_ppo.py \
    --pretrain OpenLLMAI/Llama-2-7b-sft-model-ocra-500k \
    --reward_pretrain OpenLLMAI/Llama-2-7b-rm-anthropic_hh-lmsys-oasst-webgpt \
    --save_path ./ckpt/7b_llama \
    --save_steps -1 \
    --logging_steps 1 \
    --eval_steps -1 \
    --micro_train_batch_size 2 \
    --train_batch_size 128 \
    --micro_rollout_batch_size 4 \
    --rollout_batch_size 1024 \
    --max_epochs 1 \
    --prompt_max_len 1024 \
    --generate_max_len 1024 \
    --zero_stage 2 \
    --bf16 \
    --actor_learning_rate 5e-7 \
    --critic_learning_rate 9e-6 \
    --init_kl_coef 0.01 \
    --prompt_data Open-Orca/OpenOrca,Dahoas/full-hh-rlhf,tasksource/oasst1_pairwise_rlhf_reward \
    --prompt_data_probs 0.4,0.5,0.1 \
    --max_samples 80000 \
    --normalize_reward \
    --adam_offload \
    --flash_attn \
    --gradient_checkpointing

The text was updated successfully, but these errors were encountered:

mickelliu · 2024-04-28T07:35:01Z

Very interesting. Glad to see you can pull nice results with the current setup.
I'm contributing the training curve for fine-tuning another llama2-based model, tulu2-7B with UltraRM-13B on the ultrafeedback dataset.

The fine-tuned result (in terms of rewards) isn't as high as the other library (e.g. EasyLM) under similar hyperparameter settings, and I'm still trying to figure out why.

mickelliu · 2024-05-30T19:41:30Z

The fine-tuned result (in terms of rewards) isn't as high as the other library (e.g. EasyLM) under similar hyperparameter settings, and I'm still trying to figure out why.

btw this is resolved. I was able to pull good-performing models comparable to our other setups just with a few minor differences. Great work!

hijkzzz changed the title ~~LLaMA2-7B RLHF curves~~ LLaMA2-7B Ray+RLHF curves Apr 9, 2024

hijkzzz changed the title ~~LLaMA2-7B Ray+RLHF curves~~ LLaMA2-7B Ray+RLHF+default setting training curves Apr 9, 2024

hijkzzz changed the title ~~LLaMA2-7B Ray+RLHF+default setting training curves~~ [Baseline] LLaMA2-7B RLHF training curves Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Baseline] LLaMA2-7B RLHF training curves #263

[Baseline] LLaMA2-7B RLHF training curves #263

hijkzzz commented Apr 9, 2024 •

edited

mickelliu commented Apr 28, 2024 •

edited

mickelliu commented May 30, 2024

[Baseline] LLaMA2-7B RLHF training curves #263

[Baseline] LLaMA2-7B RLHF training curves #263

Comments

hijkzzz commented Apr 9, 2024 • edited

mickelliu commented Apr 28, 2024 • edited

mickelliu commented May 30, 2024

hijkzzz commented Apr 9, 2024 •

edited

mickelliu commented Apr 28, 2024 •

edited