[PaddleNLP 3.0] Update README #8681

DrownFish19 · 2024-06-28T03:58:26Z

PR types

Others

PR changes

Docs

Description

Update README.md and README_en.md.
Add markdownlint tool to format .md files.

paddle-bot · 2024-06-28T03:58:30Z

Thanks for your contribution!

codecov · 2024-06-28T04:29:20Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.42%. Comparing base (e336e78) to head (2e6b521).
Report is 1 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8681   +/-   ##
========================================
  Coverage    55.42%   55.42%           
========================================
  Files          626      626           
  Lines        98082    98082           
========================================
  Hits         54364    54364           
  Misses       43718    43718

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ddleNLP into dev_update_readme_3.0beta

gongel · 2024-07-05T03:56:57Z

README.md


-* **2023.08.15 [PaddleNLP v2.6](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.6.0)**： 发布[全流程大模型工具链](./llm)，涵盖预训练，精调，压缩，推理以及部署等各个环节，为用户提供端到端的大模型方案和一站式的开发体验；内置[4D并行分布式Trainer](./docs/trainer.md)，[高效微调算法LoRA/Prefix Tuning](./llm#33-lora), [自研INT8/INT4量化算法](./llm#6-量化)等等；全面支持[LLaMA 1/2](./llm/llama), [BLOOM](.llm/bloom), [ChatGLM 1/2](./llm/chatglm), [GLM](./llm/glm), [OPT](./llm/opt)等主流大模型
+* **2023.08.15 [PaddleNLP v2.6](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.6.0)**： 发布[全流程大模型工具链](./llm)，涵盖预训练，精调，压缩，推理以及部署等各个环节，为用户提供端到端的大模型方案和一站式的开发体验；内置[4D并行分布式Trainer](./docs/trainer.md)，[高效微调算法LoRA/Prefix Tuning](./llm#33-lora), [自研INT8/INT4量化算法](./llm#6-量化)等等；全面支持[LLaMA 1/2](./llm/config/llama), [BLOOM](.llm/config/bloom), [ChatGLM 1/2](./llm/config/chatglm), [GLM](./llm/config/glm), [OPT](./llm/config/opt)等主流大模型


bloom地址有问题，GLM要不就不要了。

bloom已经修改，GLM删除了

* update * update readme * update * update * update * update * update * update * update * update README(EN)

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

…n-readme' into dev_update_readme_3.0beta

update readme

…ddleNLP into dev_update_readme_3.0beta

wawltor · 2024-07-11T02:37:22Z

README.md

-* **2024.06.27 [PaddleNLP v3.0 Beta](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v3.0.0-beta0)**：拥抱大模型，体验全升级。统一大模型工具链，实现国产计算芯片全流程接入；全面支持飞桨4D并行配置、高效精调策略、高效对齐算法、高性能推理等大模型产业级应用流程；自研极致收敛的RsLoRA+算法、自动扩缩容存储机制Unified Checkpoint和通用化支持FastFFN、FusedQKV助力大模型训推；主流模型持续支持更新，提供高效解决方案。
-
-* **2024.04.24 [PaddleNLP v2.8](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.8.0)**：自研极致收敛的RsLoRA+算法，大幅提升PEFT训练收敛速度以及训练效果；引入高性能生成加速到RLHF PPO算法，打破 PPO 训练中生成速度瓶颈，PPO训练性能大幅领先。通用化支持 FastFFN、FusedQKV等多个大模型训练性能优化方式，大模型训练更快、更稳定。
+* **2024.06.27 [PaddleNLP v3.0 Beta](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v3.0.0-beta0)**：拥抱大模型，体验全升级。统一大模型工具链，实现国产计算芯片全流程接入；全面支持飞桨4D 并行配置、高效精调策略、高效对齐算法、高性能推理等大模型产业级应用流程；自研极致收敛的 RsLoRA+算法、自动扩缩容存储机制 Unified Checkpoint 和通用化支持 FastFFN、FusedQKV 助力大模型训推；主流模型持续支持更新，提供高效解决方案。


整体再统一下工具链和套件这两个术语，我看LLM目录和主readme还有相关的技术术语

删除“工具链”表述，统一使用“套件”

wawltor · 2024-07-11T02:41:26Z

README.md

-支持数据、分片、张量、流水线并行的4D高性能训练，Trainer支持分布式策略配置化，降低复杂分布式组合带来的使用成本；
-Unified Checkpoint大模型存储格式在模型参数分布上支持动态扩缩容训练，降低硬件切换带来的迁移成本。
+
+支持数据、分片、张量、流水线并行的4D 高性能训练，Trainer 支持分布式策略配置化，降低复杂分布式组合带来的使用成本；


模型并行策略、分组参数切片组合、流水线并行策略和数据并行策略这里可以按照官方的术语统一下4D并行的介绍

参考官网分布式训练简介修改为“支持纯数据并行策略、分组参数切片的数据并行策略、张量模型并行策略和流水线模型并行策略的4D 高性能训练”

摘要：
针对千亿参数及以上的模型，可选用飞桨的多维混合并行策略。此类策略有效地融合了纯数据并行、分组参数切片的数据并行、张量模型并行、流水线模型并行、专家并行等多种并行策略，为用户提供高效的大模型分布式训练解决方案。

wawltor · 2024-07-11T02:41:59Z

README.md


 ### <a href=#高效精调与高效对齐> 🤗 高效精调与高效对齐 </a>
-精调和对齐算法深度结合零填充数据流和FlashMask高性能算子，降低训练无效数据填充和计算，大幅提升精调和对齐训练吞吐。
+
+精调和对齐算法深度结合零填充数据流和 FlashMask 高性能算子，降低训练无效数据填充和计算，大幅提升精调和对齐训练吞吐。


对齐在DPO算法中还没有上FlashMask策略，这里去掉对齐的说法

修改为：精调算法深度结合零填充数据流和 FlashMask 高性能算子，降低训练无效数据填充和计算，大幅提升精调训练吞吐。

wawltor · 2024-07-11T02:42:50Z

README.md

@@ -86,27 +88,28 @@ pip install --upgrade paddlenlp
 pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html


这样的方式可以安装最新的dev版本吗？

可以安装最新版本，通过paddlenlp.version.commit查看是前一天的最新版本

wawltor · 2024-07-11T02:43:22Z

README.md

->>> tokenizer.batch_decode(outputs[0])
-['我是一个AI语言模型，我可以回答各种问题，包括但不限于：天气、新闻、历史、文化、科学、教育、娱乐等。请问您有什么需要了解的吗？']
+>>> print(tokenizer.batch_decode(outputs[0]))
+['我是一个 AI 语言模型，我可以回答各种问题，包括但不限于：天气、新闻、历史、文化、科学、教育、娱乐等。请问您有什么需要了解的吗？']


这里应该不能空格？看起来要豁免下markdown的格式处理

修改了格式化代码，豁免代码块内中英文混写空格。

wawltor · 2024-07-11T02:43:58Z

README_en.md

->>> tokenizer.batch_decode(outputs[0])
-['我是一个AI语言模型，我可以回答各种问题，包括但不限于：天气、新闻、历史、文化、科学、教育、娱乐等。请问您有什么需要了解的吗？']
+>>> print(tokenizer.batch_decode(outputs[0]))
+['我是一个 AI 语言模型，我可以回答各种问题，包括但不限于：天气、新闻、历史、文化、科学、教育、娱乐等。请问您有什么需要了解的吗？']


修改了格式化代码，豁免代码块内中英文混写空格。

wawltor · 2024-07-11T03:08:06Z

llm/README.md


-此项目支持了LLaMA、GPT-3、BaiChuan、Qwen、Mixtral 等大模型的预训练。用户切换配置config文件，即可一键运行。
+数据详细制作流程可参考[此处](https://paddlenlp.readthedocs.io/zh/latest/llm/pretraining/dataset.html) , [Pretrain 和自定义数据集](https://paddlenlp.readthedocs.io/zh/latest/llm/pretraining/dataset.html)


这里看起来放了两个重复链接

已经删除，将Pretrain 和自定义数据集和上方数据处理部分结合。修改后如下：

我们在此处提供了更详细的预训练数据制作，Pretrain 和自定义数据集，分布式策略支持情况，性能测试报告文档，参见: 大模型预训练介绍, 大模型权重列表。

wawltor · 2024-07-11T03:10:33Z

README.md

@@ -158,8 +160,8 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_finetune.py



主readme的模型列表后续通过什么方式来展示了？

通过issue列表进行展示：#8663 （申请置顶）

这个我们Readme 里面给一些链接吧

给了说明和链接

模型参数已支持 LLaMA 系列、Baichuan 系列、Bloom 系列、ChatGLM 系列、Gemma 系列、Mistral 系列、OPT 系列和 Qwen 系列，详细列表👉【LLM】模型参数支持列表

4D 并行和算子优化已支持 LLaMA 系列、Baichuan 系列、Bloom 系列、ChatGLM 系列、Gemma 系列、Mistral 系列、OPT 系列和 Qwen 系列，详细列表👉【LLM】模型4D 并行和算子支持列表

DrownFish19 · 2024-07-11T06:23:51Z

docs/llm/docs/quantization.md

- `auto_clip`: AWQ时是否进行自动搜索截断值并对模型权重进行截断操作，截断操作有利于量化模型精度，但搜索速度较慢。默认为False。
- `autoclip_step`: AutoClip步数，也即模型前向次数，采样时默认concat每轮数据用来搜索截断值，默认为8。
-
+<summary>&emsp; 量化参数（QuantArgument）</summary>


此处修改中英空格和拼写问题

ZHUI · 2024-07-11T06:16:20Z

.markdownlint.yaml

+  # Only check sibling headings
+  siblings_only: false
+
+# MD025/single-title/single-h1 : Multiple top-level headings in the same document : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md025.md


ZHUI · 2024-07-11T06:23:51Z

README.md

@@ -158,8 +160,8 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_finetune.py



这个我们Readme 里面给一些链接吧

ZHUI · 2024-07-11T06:32:05Z

最新dev下面的文档软链接都丢失了。
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/docs/llm

https://github.com/PaddlePaddle/PaddleNLP/tree/v2.8.1/docs/llm

ZHUI

LGTM

update

ca9518c

gongel previously approved these changes Jun 28, 2024

View reviewed changes

DrownFish19 and others added 2 commits July 3, 2024 19:54

Merge branch 'PaddlePaddle:develop' into dev_update_readme_3.0beta

8d30c5d

update readme

d307fbb

DrownFish19 dismissed gongel’s stale review via d307fbb July 3, 2024 11:58

DrownFish19 and others added 8 commits July 4, 2024 02:09

update

7dfafba

update

91886d6

Merge branch 'PaddlePaddle:develop' into dev_update_readme_3.0beta

f5fe743

update

392ed1f

Merge branch 'dev_update_readme_3.0beta' of github.com:DrownFish19/Pa…

ecc0b1a

…ddleNLP into dev_update_readme_3.0beta

update

99617ec

update

7745c49

update

774c45b

DrownFish19 changed the title ~~Update README~~ [PaddleNLP 3.0] Update README Jul 5, 2024

gongel reviewed Jul 5, 2024

View reviewed changes

DrownFish19 added 2 commits July 5, 2024 04:06

update

7a46e6a

update README(EN)

c6d2104

gongel pushed a commit that referenced this pull request Jul 8, 2024

[cherry pick] Update README (#8681) (#8727)

e773524

* update * update readme * update * update * update * update * update * update * update * update README(EN)

jzhang533 and others added 10 commits July 10, 2024 12:21

correct broken links in readme

6f5e235

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

Merge branch 'PaddlePaddle:develop' into dev_update_readme_3.0beta

ae6c69d

add markdownlint

bacb006

update

dc88f8f

add check_spaces

7f68f68

update

b09c85e

Merge remote-tracking branch 'paddlenlp-zhangjun/update-break-links-i…

1924426

…n-readme' into dev_update_readme_3.0beta

Merge branch 'PaddlePaddle:develop' into dev_update_readme_3.0beta

46a55c9

Merge branch 'develop' into dev_update_readme_3.0beta

f35db62

update readme

f9efc5d

DrownFish19 and others added 4 commits July 10, 2024 20:15

Merge branch 'PaddlePaddle:develop' into dev_update_readme_3.0beta

26a2563

update readme

87db3a8

update

7d0627c

update

b4a5862

update readme

DrownFish19 force-pushed the dev_update_readme_3.0beta branch from 7d0627c to b4a5862 Compare July 10, 2024 12:41

Merge branch 'dev_update_readme_3.0beta' of github.com:DrownFish19/Pa…

68ac2b3

…ddleNLP into dev_update_readme_3.0beta

wawltor reviewed Jul 11, 2024

View reviewed changes

DrownFish19 and others added 2 commits July 11, 2024 06:13

update readmes

d3d90a9

Merge branch 'develop' into dev_update_readme_3.0beta

0869e22

DrownFish19 commented Jul 11, 2024

View reviewed changes

ZHUI reviewed Jul 11, 2024

View reviewed changes

DrownFish19 added 5 commits July 11, 2024 06:41

update

0574f12

update

f99f61b

update llm doc index

6ad298b

update

e00e1d1

add rlhf docs

2e6b521

ZHUI approved these changes Jul 12, 2024

View reviewed changes

wawltor merged commit 10d058d into PaddlePaddle:develop Jul 12, 2024
10 of 12 checks passed

DrownFish19 deleted the dev_update_readme_3.0beta branch July 12, 2024 03:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PaddleNLP 3.0] Update README #8681

[PaddleNLP 3.0] Update README #8681

DrownFish19 commented Jun 28, 2024 •

edited

Loading

paddle-bot bot commented Jun 28, 2024

codecov bot commented Jun 28, 2024 •

edited

Loading

gongel Jul 5, 2024 •

edited

Loading

DrownFish19 Jul 5, 2024

wawltor Jul 11, 2024

DrownFish19 Jul 11, 2024

wawltor Jul 11, 2024

DrownFish19 Jul 11, 2024

wawltor Jul 11, 2024

DrownFish19 Jul 11, 2024 •

edited

Loading

wawltor Jul 11, 2024

DrownFish19 Jul 11, 2024 •

edited

Loading

wawltor Jul 11, 2024

DrownFish19 Jul 11, 2024

wawltor Jul 11, 2024

DrownFish19 Jul 11, 2024

wawltor Jul 11, 2024

DrownFish19 Jul 11, 2024

wawltor Jul 11, 2024

DrownFish19 Jul 11, 2024

ZHUI Jul 11, 2024

DrownFish19 Jul 11, 2024 •

edited

Loading

DrownFish19 Jul 11, 2024

ZHUI Jul 11, 2024

ZHUI Jul 11, 2024

ZHUI commented Jul 11, 2024 •

edited

Loading

ZHUI left a comment


		* 2023.08.15 [PaddleNLP v2.6](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.6.0)：发布[全流程大模型工具链](./llm)，涵盖预训练，精调，压缩，推理以及部署等各个环节，为用户提供端到端的大模型方案和一站式的开发体验；内置[4D并行分布式Trainer](./docs/trainer.md)，[高效微调算法LoRA/Prefix Tuning](./llm#33-lora), [自研INT8/INT4量化算法](./llm#6-量化)等等；全面支持[LLaMA 1/2](./llm/llama), [BLOOM](.llm/bloom), [ChatGLM 1/2](./llm/chatglm), [GLM](./llm/glm), [OPT](./llm/opt)等主流大模型
		* 2023.08.15 [PaddleNLP v2.6](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.6.0)：发布[全流程大模型工具链](./llm)，涵盖预训练，精调，压缩，推理以及部署等各个环节，为用户提供端到端的大模型方案和一站式的开发体验；内置[4D并行分布式Trainer](./docs/trainer.md)，[高效微调算法LoRA/Prefix Tuning](./llm#33-lora), [自研INT8/INT4量化算法](./llm#6-量化)等等；全面支持[LLaMA 1/2](./llm/config/llama), [BLOOM](.llm/config/bloom), [ChatGLM 1/2](./llm/config/chatglm), [GLM](./llm/config/glm), [OPT](./llm/config/opt)等主流大模型

		@@ -86,27 +88,28 @@ pip install --upgrade paddlenlp
		pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html


		此项目支持了LLaMA、GPT-3、BaiChuan、Qwen、Mixtral 等大模型的预训练。用户切换配置config文件，即可一键运行。
		数据详细制作流程可参考[此处](https://paddlenlp.readthedocs.io/zh/latest/llm/pretraining/dataset.html) , [Pretrain 和自定义数据集](https://paddlenlp.readthedocs.io/zh/latest/llm/pretraining/dataset.html)

		@@ -158,8 +160,8 @@ python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_finetune.py

[PaddleNLP 3.0] Update README #8681

[PaddleNLP 3.0] Update README #8681

Conversation

DrownFish19 commented Jun 28, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jun 28, 2024

codecov bot commented Jun 28, 2024 • edited Loading

Codecov Report

gongel Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrownFish19 Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrownFish19 Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrownFish19 Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZHUI commented Jul 11, 2024 • edited Loading

ZHUI left a comment

Choose a reason for hiding this comment

DrownFish19 commented Jun 28, 2024 •

edited

Loading

codecov bot commented Jun 28, 2024 •

edited

Loading

gongel Jul 5, 2024 •

edited

Loading

DrownFish19 Jul 11, 2024 •

edited

Loading

DrownFish19 Jul 11, 2024 •

edited

Loading

DrownFish19 Jul 11, 2024 •

edited

Loading

ZHUI commented Jul 11, 2024 •

edited

Loading