Add PPO training. #7305

guoshengCS · 2023-10-24T11:07:27Z

PR types

New features

PR changes

Others

Description

Add PPO training.

paddle-bot · 2023-10-24T11:07:32Z

Thanks for your contribution!

codecov · 2023-10-24T11:49:46Z

Codecov Report

Attention: 35 lines in your changes are missing coverage. Please review.

Comparison is base (16d3c49) 56.68% compared to head (ec150b6) 56.70%.
Report is 1 commits behind head on develop.

Files	Patch %	Lines
paddlenlp/generation/utils.py	46.66%	32 Missing ⚠️
paddlenlp/transformers/llama/modeling.py	57.14%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7305      +/-   ##
===========================================
+ Coverage    56.68%   56.70%   +0.02%     
===========================================
  Files          588      588              
  Lines        89243    89305      +62     
===========================================
+ Hits         50584    50639      +55     
- Misses       38659    38666       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

guoshengCS · 2023-12-27T03:08:01Z

相较于Beaver(DeepSpeed)这里PPOTrainer的代码实现会更复杂一些，主要是要从粗粒度的Trainer.train中拷贝抽离出来囊括完整单步训练（forward+backward+opt.step）的处理逻辑代码(full_train_step)，另外一些是为了和Trainer尽可能功能和表现对齐以及复用有一些代码拷贝和适配。

如果Trainer有更细粒度的包括forward、backward、opt.step方法提供出来，对于PPOTrainer以及其他训练逻辑复杂一些的算法实现可能会更容易些，类似于下面Beaver使用DeepSpeed engine的情况(actor_model和reward_critic_model都是DeepSpeed engine) https://github.com/PKU-Alignment/safe-rlhf/blob/main/safe_rlhf/algorithms/ppo/trainer.py#L171

Fix AutoModelForScore and update reward training usage.

…nto add-ppo

ZHUI · 2024-01-11T07:32:31Z

paddlenlp/transformers/llama/modeling.py

@@ -1214,7 +1217,7 @@ def __init__(self, config: LlamaConfig):

        # Recompute defaults to False and is controlled by Trainer
        self.enable_recompute = False
-        if config.tensor_parallel_degree > 1:
+        if config.tensor_parallel_degree > 1 and config.vocab_size % config.tensor_parallel_degree == 0:


这个可以初始化的时候搞掉吗? 我记得embeding那边有判断。

…nto add-ppo

guoshengCS · 2024-01-19T08:31:42Z

@wj-Mcat 看Test CI挂了，需要多卡运行(python -m paddle.distributed.launch)运行的测试要如何加入了

wawltor

LGTM

Add reward model and training.

2d17a80

guoshengCS added 6 commits October 26, 2023 15:57

Make reward training runable

ef24b2a

Add eval in reward training.

36e1ef2

For result alignment.

8f32c71

training setting.

f637aa2

PPO alignment with Beaver

fe61fcf

Align training with Beaver.

6cbe23e

guoshengCS added 5 commits December 27, 2023 20:12

Clean ppo_trainer.py and debug print.

d0e492d

Move score models from paddlenlp to example and add AutoModelForScore.

c265d1e

Remove eval.py, eval_score.py.

9ad5060

Fix AutoModelForScore and update reward training usage.

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

532b4a5

…nto add-ppo

Update ppo_trainer.py after merge with develop.

8f4d53f

guoshengCS marked this pull request as ready for review January 2, 2024 02:13

guoshengCS added 3 commits January 2, 2024 14:15

Make PPOTrainer support reference/reward Trainer optionally.

214d33d

Complete README.

d1929a7

Add unittest test_load_from_custom_arch for AutoConfig

e016687

guoshengCS requested a review from wawltor January 4, 2024 07:57

guoshengCS added 2 commits January 8, 2024 11:33

Add test_synced_gpus.py for generation.

f2649fd

Add more test cases in test_synced_gpus.py

80b159a

guoshengCS changed the title ~~Add PPO reward model and training.~~ Add PPO training. Jan 8, 2024

guoshengCS requested a review from lugimzzz January 8, 2024 06:00

Support tensor parallel.

541fa7c

ZHUI reviewed Jan 11, 2024

View reviewed changes

ZHUI self-requested a review January 15, 2024 03:06

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

3e5981e

…nto add-ppo

Add require_gpu to test_synced_gpus.py.

ec150b6

wawltor approved these changes Jan 22, 2024

View reviewed changes

wawltor merged commit d4de12c into PaddlePaddle:develop Jan 22, 2024
7 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PPO training. #7305

Add PPO training. #7305

guoshengCS commented Oct 24, 2023 •

edited

Loading

paddle-bot bot commented Oct 24, 2023

codecov bot commented Oct 24, 2023 •

edited

Loading

guoshengCS commented Dec 27, 2023 •

edited

Loading

ZHUI Jan 11, 2024

guoshengCS commented Jan 19, 2024 •

edited

Loading

wawltor left a comment

Add PPO training. #7305

Add PPO training. #7305

Conversation

guoshengCS commented Oct 24, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Oct 24, 2023

codecov bot commented Oct 24, 2023 • edited Loading

Codecov Report

guoshengCS commented Dec 27, 2023 • edited Loading

ZHUI Jan 11, 2024

Choose a reason for hiding this comment

guoshengCS commented Jan 19, 2024 • edited Loading

wawltor left a comment

Choose a reason for hiding this comment

guoshengCS commented Oct 24, 2023 •

edited

Loading

codecov bot commented Oct 24, 2023 •

edited

Loading

guoshengCS commented Dec 27, 2023 •

edited

Loading

guoshengCS commented Jan 19, 2024 •

edited

Loading