28 Jun 03:05

DrownFish19

v3.0.0-beta0

a2b8a78

v3.0.0-beta0 Latest

Latest

很高兴地通知大家，飞桨大模型套件发布v3.0.0beat版本：拥抱大模型，体验全升级。具体工作如下：

统一大模型工具链，实现国产计算芯片全流程接入；
全面支持飞桨4D并行配置、高效精调策略、高效对齐算法、高性能推理等大模型产业级应用流程；
自研极致收敛的RsLoRA+算法、自动扩缩容存储机制Unified Checkpoint和通用化支持FastFFN、FusedQKV助力大模型训推；
主流模型持续支持更新，提供高效解决方案。

大模型精调对齐训推优化

PEFT：
- 新增scaling策略，支持rslora, pissa算法 in #8256
- 适配FusedQKV和FastFFN参数 in #8372 #8526
DPO：
- 支持DPO（llama，qwen）in #8474
- 支持序列并行 in #7953
国产芯片支持：
- 适配NPU in #8303 #8342 #8359 #8399 #8409 #8401 #8431 #8439 #8438 #8442 #8528 #8642
- 适配XPU in #8282 #8505 #8515 #8588 #8595 #8598
- 适配GCU in #8445 #8470
性能优化：
- 优化Unified Checkpoint机制 in #8204 #8409 #8422 #8512
- 模型并行优化 in #8370
- 序列并行优化 in #8551
- 支持llama3 (wint8|4/a8w8) in #8630
其他
- 新增模型内存监控 in #8269

模型新增

新增Gemma模型 in #8082
- google/gemma-7b
- google/gemma-7b-it
- google/gemma-2b
- google/gemma-2b-it
新增llama3模型 in #8307 #8371
- meta-llama/Meta-Llama-3-8B
- meta-llama/Meta-Llama-3-8B-Instruct
- meta-llama/Meta-Llama-3-70B
- meta-llama/Meta-Llama-3-70B-Instruct
新增Qwen2模型 in #8338 #8584 #8601
- Qwen/Qwen1.5-0.5B
- Qwen/Qwen1.5-0.5B-Chat
- Qwen/Qwen1.5-1.8B
- Qwen/Qwen1.5-1.8B-Chat
- Qwen/Qwen1.5-4B
- Qwen/Qwen1.5-4B-Chat
- Qwen/Qwen1.5-7B
- Qwen/Qwen1.5-7B-Chat
- Qwen/Qwen1.5-14B
- Qwen/Qwen1.5-14B-Chat
- Qwen/Qwen1.5-32B
- Qwen/Qwen1.5-32B-Chat
- Qwen/Qwen1.5-72B
- Qwen/Qwen1.5-72B-Chat
- Qwen/Qwen1.5-110B
- Qwen/Qwen1.5-110B-Chat
- Qwen/Qwen1.5-MoE-A2.7B
- Qwen/Qwen1.5-MoE-A2.7B-Chat
- Qwen/Qwen2-0.5B
- Qwen/Qwen2-0.5B-Instruct
- Qwen/Qwen2-1.5B
- Qwen/Qwen2-1.5B-Instruct
- Qwen/Qwen2-7B
- Qwen/Qwen2-7B-Instruct
- Qwen/Qwen2-72B
- Qwen/Qwen2-72B-Instruct
- Qwen/Qwen2-57B-A14B
- Qwen/Qwen2-57B-A14B-Instruct

基础框架升级

功能优化：
- 支持FusedQKV和FastFFN权重自动融合分割 in #8202 #8378 #8432
- 支持模型并行参数同步设置 in #8311
- 支持RoPE算子设定theta in #8440
- 通信overlap优化 in #8276 #8473 #8499 #8594
AutoParallel优化
- llama支持recompute机制 in #8265
- 适配llama3 in #8395
- position_ids优化 in #8363
- 支持流水线并行split_backward in #8479
- 适配qwen in #8312
分布式能力优化：
- 修复流水线并行中enable_sharding_comm_overlap中参数错误问题 in #8333
- MoE并行支持 in #8498 #8522
chat能力优化：
- 增加Chat template in #8226
其他
- 文档 in #8336 #8393
- 更新nested操作 in #8380
- 随机性更新 in #8450 #8396
- 算子更新 in #8472
- example更新 in #8538

问题修复

修复sharding数量小于100的bug in #8146
修复TP/PP参数合并问题 in #8239
修复tensor.shape与paddle.shape(tensor)不一致问题 in #8260
修复fp16+delay_scale_loss_scale+sharding_stage1_overlap的bug in #8314
增加pipelines运行文档及提示 in #8292 #8308 #8202 #8353
修复text feature extraction任务中tokenizer输入 in #8331
修复import error in #8332 #8367

结构调整

PaddleNLP文件结构调整 in #8609 #8613 #8605 #8614 #8617 #8626 #8618 #8625 #8619 #8629 #8601 #8627 #8666

What's Changed

[dist]pip requirements-dev.txt by @Liujie0926 in #8258
add scaling by @lugimzzz in #8256
[LLM]Support Gemma model by @Southpika in #8082
[BugFix] Try except sequence parallel utils by @DesmonDay in #8189
Update CodeCov GitHub Action by @sijunhe in #8268
[AutoParallel] Open recompute strategy for llama model by @zhangbo9674 in #8265
Fix sharding < 100 limitation bug by @sneaxiy in #8146
use tensor.shape bug not paddle.shape(tensor) by @wanghuancoder in #8260
[dist CI]update paddlenlp install for CI by @Liujie0926 in #8267
[Bug Fix]Fix merge parameters in pp by @Southpika in #8239
[LLM] add memory stats to logger of trainer by @SylarTiaNII in #8269
Add p2p_comm_overlap for Llama-2-70b benchmark. by @Xreki in #8276
add a100 test ground truth by @zhiqiu in #8249
[paddle-pipelines] faq semantic search question answering reamde by @w5688414 in #8292
[paddle-pipelines] Add pipelines documentation by @w5688414 in #8308
Support llama-3 by @ZHUI in #8307
[Distributed] [CustomDevices] Adapt SP on lora && polish MC2 APIs by @SylarTiaNII in #8303
fix bug for fp16 + delay_scale_loss_scale + sharding_stage1_overlap by @FeixLiu in #8314
[paddle-pipelines] Update mkdocs by @w5688414 in #8310
[benchmark]update llama2_ips by @Liujie0926 in #8322
[dist CI]fix before_hook by @Liujie0926 in #8283
benchmark llama worker=1 by @wanghuancoder in #8305
【AutoParallel】Add llama2 UT for auto-parallel by @heavyrain-lzy in #8300
Add system env log for llama test by @zhangbo9674 in #8321
[LLM] Support fuse attention q, k, v weights by @DrownFish19 in #8202
[Distributed] fix lora by @SylarTiaNII in #8325
fix try import by @w5688414 in https://github.com/PaddlePaddle/Pa...

Contributors

zhiqiu, jeff41404, and 49 other contributors

Assets 2

20 Jun 07:42

ZHUI

v2.8.1

db99efd

v2.8.1

What's Changed

[Trainer] Fix sharding overlap bug by @DesmonDay in #8334
[Cherry-pick] update truncate by @KB-Ding in #8375
[BugFix] Fix llama3 eot_id. by @ZHUI in #8373
[Trainer] update distributed dataloader by @DesmonDay in #8426
[BugFix] Fix load rng compatibility. by @ZHUI in #8451
Cherry pick/fast_safe_open by @ZHUI in #8458
【cherry pick】adapter new type promotion rule for Paddle 2.6 by @zxcd in #8463
Quick fix from pretrained. by @ZHUI in #8487
Release/2.8 by @Galaxy1458 in #8437
Fix from_pretrained os.path.split by @DesmonDay in #8508
[fea] Cherry-picked MOE updates from develop by @bo-ke in #8531
[LLM] relocate tensor_parallel_output to avoid conflict (#8419) by @DesmonDay in #8533
Update sequence_parallel for predict by @DesmonDay in #8547
Cp/fix by @ZHUI in #8569
Do not save moe_group by @DesmonDay in #8570
[Release] 2.8.1 by @ZHUI in #8636

Full Changelog: v2.8.0...v2.8.1

Contributors

zxcd, ZHUI, and 4 other contributors

Assets 2

24 Apr 10:04

w5688414

v2.8.0

3105c18

v2.8.0

很高兴地通知大家，飞桨大模型套件发布v2.8.0版本。这个版本中，我们深度优化套件的大模型精调对齐的能力，提升大模型套件在国产计算硬件训推能力，具体工作如下：

特色精调和高效对齐：提供自研极致收敛的RsLoRA+算法，大幅提升PEFT训练收敛速度以及训练效果；引入高性能生成加速到RLHF PPO算法，打破 PPO 训练中生成速度瓶颈，PPO训练性能大幅领先。
大模型训练提速：通用化支持 FastFNN、FusedQKV等多个大模型训练性能优化方式，大模型训练更快、更稳定。

大模型精调对齐训推优化

精调
- PEFT
  - 新增QLoRA pipeline parallel支持 #7801
  - 自定义python算子，优化LoRA的前反向计算 #8106
  - 新增 rslora，lora+，pissa 算法 #8111
- 长序列
  - 新增长序列方案和模型解耦。RotaryEmbedding，LinearScalingRotaryEmbedding，NTKScalingRotaryEmbedding，DynamicNTKScalingRotaryEmbedding等。#8076
- Alignment
  - 新增PPO 对齐算法 #7305
- 训练策略
  - 新增LLaMA sequence parallel #7746
  - 新增LLaMa master_grad #7658
  - GPT新增auto_parallel的支持。 #8160
- 新增算子
  - 新增GQA 算子支持 #7906
  - 新增gqa fuse attention qkv #7890
  - 新增SwiGLU 算子 #8038
推理
- 新增QWenVL 的静态图推理 #7808
  模型新增
新增Deberta，Debertav2模型 #8227
- deepset/deberta-v3-large-squad2
- microsoft/deberta-v2-xlarge
- microsoft/deberta-v3-base
- microsoft/deberta-v3-large
- microsoft/deberta-base
新增mixtral-of-experts #7803
- mistralai/Mixtral-8x7B-Instruct-v0.1
- mistralai/Mixtral-8x7B-v0.1
新增LLama3 #8315
- meta-llama/Meta-llama-3-8b
- meta-llama/Meta-Llama-3-8B-Instruct
- meta-llama/Meta-llama-3-70b
- meta-llama/Meta-Llama-3-70B-Instruct

基础框架升级

Trainer升级
- Trainer新增 ignore_save_lr_and_optim 参数，可以忽略保存lr scheduler以及optimizer权重 #7978
- Trainer新增 Wandb 和 Tensorboard 支持。#7863
- Trainer支持同时解析命令行与json文件参数 #7768
- trainer新增gradient_sync_after_accumulate支持。#8045
- dataloader新增cuda编译检查 #8099
AutoParallel升级
- llama 自动并行支持bf16损失 #7874
- 增加refined-recompute机制#7349
- 在AMP-O2策略下支持master_grad#7658
- 进一步完善动静统一自动并行分布式训练基本功能#7985 #8114
- 新增Llama2模型基于AutoTrainer的半自动训练 #7851 #7885
- 新增llama的hybrid_parallel_topo_order策略。#8011
- llama模型组网动静统一 #8127
其他
- 重构download下载逻辑，支持从bos、hf hub、aistudio、model scope下载模型 #7608 #8020 #8088
- 新增分布式训练的pipeline parallel #8051
- 适配npu的FA #8171 #8210
- llama新增block_attention/cachekv quant #7649

其他支持

新增俄罗斯套娃（matryoshka representation learning）检索策略，节省计算和存储资源。#8165

问题修复

日志级别修改，并增加timelog计时日志，兼容不同设备。#8261
修复pipeline并行中随机初始化的shared weights不一致的问题，覆盖GPT/OPT等模型。#7772
关闭CI及单测中从huggingface hub下载的逻辑 #7798 #8198
修复llm的gradio开启chat template时候重复拼接query 和 history的问题。#7992
修复GPT模型下载key error问题。#8253
修复LlamaRotaryEmbedding #7882
修复allreduce dtype的问题 #7876
修复框架侧dev分支清理 paddle.jit.dy2static.utils_helperAPI的问题 #7989
修复read-data timer在ignore_data_skip=False and skip_profile_timer=False 的问题。#8177
修复Wandb单测问题 #8066 #8056
修复Trainer同时解析json与命令行列表参数报错问题#7860
修复Gradio UI 中的推理问题 #7740 #7788
修复 Tokenizer 相关的基础问题 #7797 7870
修复 custom devices上loading rng state的问题。#7894
修复自动并行打印BF16的loss编码错乱的问题#7874
采用float初始化模型，修复静态图自动并行AMP报错问题#8033#8199
修复ShardDataloader接口在PipeLine Parallelism下使用错误问题#8014
修复llama在custom devices的精度问题。#7895
修复NPU AICPU算子问题 #7976
修复FusedLinearWithGradAdd少传参数的问题。#8178

What's Changed

[Unified Checkpoint] Add unified checkpoint training args doc. by @DesmonDay in #7756
[AutoParallel] Auto Trans PP to VPP by @zhaoyinglia in #7747
Add codecov check by @zjjlivein in #7760
[CE] Delete gpt_for_sequence_classification by @ZHUI in #7757
[DOC] Update trainer.md by @ZHUI in #7761
[Release] Change version to 2.7.0 by @ZHUI in #7764
[benchmark]close skip_memory_metrics for ips by @Liujie0926 in #7732
[Release] Update release.yml to release tags by @ZHUI in #7765
[AutoParallel] Add Sequence Parallel for Static LLaMA by @JZ-LIANG in #7746
[New Features] support dynamic src_length by @wj-Mcat in #7740
Fix unified_checkpoint bug by @DrownFish19 in #7770
[DONE] aistudio, hf hub, bos update download by @JunnYu in #7608
[Trainer] Fix dist dataloader eval by @DesmonDay in #7777
[Paddle-pipelines] Update convert_files_to_dicts_splitter by @w5688414 in #7748
[PEFT]fix lora model tp when existing other trainable module by @lugimzzz in #7781
[Paddle-Pipelines] update faiss by @qingzhong1 in #7793
Fix shared weights sync for PipelineLayer by @DrownFish19 in #7772
[tests] download slow by @JunnYu in #7798
[INFER][LLM] Support qwen in fined grained dybatch v1 by @DanGuge in #7644
Add CE for Distributed Hybrid Parallel by @iosmers in #7782
add MP2-SP2-pp4-vpp2-SD2-stage1-mbs2-acc8 ce by @tianhaodongbd in #7774
[Pretrain] Fix eval during pretrain by @DesmonDay in #7806
pipeline parallel benchmark by @zhangting2020 in #7759
[Bug fixes] fix br gradio by @wj-Mcat in #7788
delete useless code for write_cache_kv.cu by @yuanlehome in #7812
[llm]support qlora pp by @lugimzzz in #7801
Trainer support simultaneously parse JSON files and cmd arguments. by @greycooker in #7768
[LLM] Support block_attention/cachekv quant for llama by @RichardWooSJTU in #7649
[Bug Fix] fix paddle multipy_fwd_func warning message by @BeingGod in #7818
[llm]fix lora by @lugimzzz in #7824
fused rms spmd by @liuzhenhai93 in #7830
[Pretrain] Fix eval during pretrain by @DesmonDay in #7827
[neural search][fix bug of evaluate.py] by @ZeyuTeng96 in #7832
[neural search] fix the bug of reading files when calculating the recall scores by @shenghwa in #7836
[Bug fixes] update chatglm tokenizer by @wj-Mcat in #7797
[semantic_indexing] fix bug of evaluate.py by @ZeyuTeng96 in #7843
[faq] fix bug of evaluate.py by @ZeyuTeng96 in #7840
[text_classification_retrieval_based] fix bug of evaluate.py by @ZeyuTeng96 in #7844
[LLM] add Qwen-7B-Chat to PaddleNLP unit test by @ziangqin-baidu in #7823
Support 5.2 bloom by @zhoutianzi666 in #7846
[unified checkpoint] Fix last checkpoint save by @DrownFish19 in #7854
[unified checkpoint] fix checkpoint names by @DrownFish19 in #7795
[New Features]add ranks testing for test_predictor by @wj-Mcat in #7800
[Auto Parallel] Support dynamic semi-auto training in Llama2 model by @haohongxiang in #7851
[CI] add ci approval pipelines by @zjjlivein in #7859
[fix] fix a bug of trainer/argparser.py by @greycooker in #7860
[Improvement] fix ops improting in utils by @wj-Mcat in #7865
[Add CE] Add CE for Hybrid Parallism by @iosmers in #7817
[Unified Checkpoint] Cherry pick empty cache. by @ZHUI in #7868
Add PPO training. by @guoshengCS in #7305
Update reward_main.py by @wawltor in #7880
Update ppo_main.py by @wawltor in #7881
[LLM] revert benchmark codes by @RichardWooSJTU in #7871
[LLM]support QWenVL second part by @DanGuge in #7808
[Bug Fixes] update chatglm1 tokenizer by @wj-Mcat in #7870
【AutoParallel】Support 'master_grad' in Llama in static auto-parallelism by @heavyrain-lzy in #7658
[Bug Fix] fix slice bug in LlamaRotaryEmbedding by @MarioLulab in #7882
【AutoParallel】Support bf16 loss in static by @heavyrain-lzy in #7874
[Bug Fix] fix allreduce tensor dtype by @BeingGod in #7876
[CE] Add Qwen into CE process by @ziangqin-baidu in #7887
[Hackathon 5th No.73] ToT by @ErnestinaQiu in #7660
[CustomDevice] fix loading rng state on custom devices by @SylarTiaNII in #7894
[LLM] ...

Contributors

co63oc, zhiqiu, and 54 other contributors

Assets 2

30 Jan 07:50

ZHUI

v2.7.2

b39e701

v2.7.2

本版本做了一些小问题的修复

What's Changed

[Unified Checkpoint] fix checkpoint names by @DrownFish19 in #7794
[Unified Checkpoint] Fix last checkpoint save by @DrownFish19 in #7810
[PEFT] Cherry pick lora fix by @lugimzzz in #7826
[Unified Checkpoint] Fix unified checkpoint by empty cache. by @ZHUI in #7855
[Fix Download] update converted logic & fix hf hub download subfolder bug by @JunnYu in #7911
[Cherry-pick] logger level by @KB-Ding in #7920
[Cherry-pick] RuntimeTimer for the toolkit (#7913) by @KB-Ding in #7921
[Release] 2.7.2 for paddlenlp bugfix. by @ZHUI in #7892

Full Changelog: v2.7.1...v2.7.2

Contributors

DrownFish19, ZHUI, and 3 other contributors

Assets 2

04 Jan 14:24

ZHUI

v2.7.1

bb9062e

v2.7.1

本版本做了一些小问题的修复

What's Changed

修复了训练恢复遇到的一些问题 @ZHUI in #7771
修复了GPT在Pipeline模式下的初始化问题 @DrownFish19 in #7775
修复了dist dataloader评估时的问题。 @DesmonDay in #7778

Full Changelog: v2.7.0...v2.7.1

Contributors

DrownFish19, ZHUI, and DesmonDay

Assets 2

03 Jan 04:07

ZHUI

v2.7.0

adf9e6f

PaddleNLP 2.7.0 Release Note

很高兴地通知大家，飞桨大模型套件发布v2.7.0版本。这个版本中，我们深入优化套件的大模型能力。从易用性、性能、到稳定性都有巨大提升。

总体而言，当前版本更新有以下亮点：

统一工具链大模型入口。统一预训练、精调、压缩、推理以及部署等环节的实现代码，到 PaddleNLP/llm目录。
全新大模型工具链文档。一站式指引用户从大模型入门到业务部署上线。文档见： https://paddlenlp.readthedocs.io/zh/latest/llm/finetune.html
全断点存储机制 Unified Checkpoint。在存储断点时将模型权重、优化器权重等进行统一safetensors格式存储，不再区分分布式策略存储，并且支持恢复训练的动态扩缩容，大大提高大模型存储的通用性。
高效微调升级。支持了高效微调+LoRA同时使用，支持了QLoRA等算法。

大模型训推全流程

预训练
- 统一了预训练入口到 llm/run_pretrain.py。
- 支持了qwen 等模型预训练，支持flash attention。
精调
- 支持可LoRA + Linear量化同时使用
- 支持了流水线并行模型 + lora一起使用
- 支持了NEFTune方法
- 添加了QLoRA支持
压缩
- 支持PTQ、QAT量化功能，包括A8W8、WINT8、WINT4、A8W4
- 支持SmoothQuant、GPTQ、AWQ等量化算法

Unified Checkpoint

在大模型背景下，通常我们需要进行多卡分布式的训练，在保存Checkpoint时所得到的模型权重通常是分片放置的，例如根据张量并行、流水线并行进行切分保存。这种根据分布式策略直接存储Checkpoint的方式非常直接明了，但也存在如下的问题：
- 对下游推理不够友好，当用户希望获取中间阶段保存的Checkpoint做下游推理时，需要手动对模型权重进行合并。
- 不利于应对做恢复训练时，可能会面临的分布式策略改变、训练节点数发生变化的情况。用户往往需要手动对Checkpoint进行处理，增加了操作复杂度。
为了最大程度地解决上述的问题，降低用户操作难度，我们对大模型存储框架进行了升级，提出了大模型统一存储方案——Unified Checkpoint。Unified Checkpoint的核心思想是将模型权重、优化器权重等进行统一safetensors格式存储，在Checkpoint存储时不再对分布式策略进行区分，提高大模型存储的通用性。
Unified Checkpoint具备以下功能与特点：
- 权重存储不区分分布式策略，并采用safetensors格式统一存储；
- 灵活支持大模型训练扩容、缩容等各种情况，能够适配不同分布式训练策略的切换。

模型新增

moka-ai/m3e-base 检索模型
BAAI/bge-small-zh-v1.5 检索模型

基础框架升级

Trainer 升级
- 支持了 "--skip_memory_metrics 0"是，显示实时显存、内存占用
- 支持 "--unified_checkpoint" "--unified_checkpoint_config" 支持混合并行下模型save，动态扩缩容重启。
新增 PretrainModelPipe基础类，支持流水线并行训练。
其他支持
支持了paddlenlp commit id 展示 paddlenlp.version.commit
支持AI Studio download add save to aistudio hub

问题修复

修复了dist_dataloader的一些问题
修复了一些模型动转静问题
修复了GPT训练的一些bug，移除了GPT2。修复了一些seed设置问题
修复了baichuan模型在流水线并行的一些问题。

New Contributors

@Wennie396 made their first contribution in #6897
@Wong4j made their first contribution in #7008
@yuanlehome made their first contribution in #7080
@Xreki made their first contribution in #7105
@Tom-Zheng made their first contribution in #7092
@TimeYWL made their first contribution in #7122
@From00 made their first contribution in #7168
@RichardWooSJTU made their first contribution in #7186
@heavyrain-lzy made their first contribution in #7269
@LokeZhou made their first contribution in #7337
@JZ-LIANG made their first contribution in #7301
@WAI-clear made their first contribution in #7402
@tianhaodongbd made their first contribution in #7293
@zzjjay made their first contribution in #7504
@anexplore made their first contribution in #7558
@niuliling123 made their first contribution in #7528
@zxcd made their first contribution in #7577
@MayYouBeProsperous made their first contribution in #7575
@iosmers made their first contribution in #7613
@AndSonder made their first contribution in #7343
@zhink made their first contribution in #7679
@kingTLE made their first contribution in #7708
Full Changelog: v2.6.1...v2.7.0

Contributors

anexplore, zxcd, and 20 other contributors

Assets 2

14 Sep 03:57

sijunhe

v2.6.1

fd2bed5

v2.6.1

What's Changed

在v2.6.1版本中，我们做了大量的bug修复，提高了LLM模型和相关组件的稳定性。除了bug修复以外，主要新增功能如下：

LLM：新增了 qwen 模型，InTokens数据流兼容了Pipeline Parallel，LLM精调支持从多个训练文件加载以及热启动，增强了LLaMA模型的不同recompute粒度
Trainer: hybrid_parallel_topo_order 选项，并修复了 sharding stage3 的保存模型。
Paddle-pipelines: 添加了对 ERNIE-Bot-turbo和ERNIE-embedding 的支持, 更新了分层搜索示例并且增强了 ChatPaper 的UI
Megatron 数据集：添加了加载 megatron 数据集的支持，支持ernie-1.0和T5数据类型

New Contributors

@xiezheng-XD made their first contribution in #6764
@carryyu made their first contribution in #6676
@xiaoxiaohehe001 made their first contribution in #6798
@MARD1NO made their first contribution in #6865
@zhoutianzi666 made their first contribution in #6905
@lchdl made their first contribution in #6964
@LaiXinyi823 made their first contribution in #6659

Full Changelog: v2.6.0...v2.6.1

Contributors

lchdl, carryyu, and 5 other contributors

Assets 2

15 Aug 13:11

sijunhe

v2.6.0

4353b64

v2.6.0

PaddleNLP 2.6 正式版本：全新升级，迈进大模型时代！

我们很高兴宣布，PaddleNLP 2.6版本现已全新升级并正式发布！此次升级标志着我们正式迈入了大模型时代。在PaddleNLP 2.6版本中，我们推出了全新的飞桨大语言模型全流程工具链。这套工具链涵盖了预训练、精调、压缩、推理以及部署等环节，为用户提供了一个完整的端到端大模型解决方案。

我们的工具链全面支持LLaMA 1/2, BLOOM, ChatGLM 1/2, GLM, OPT等主流大模型。这使得用户可以在使用同一套工具的前提下，以低成本的方式尝试各种不同的大模型。

为了支持这套大模型工具链，我们进行了大量的底层和基础框架侧的升级：

我们将Trainer API升级成为了4D并行分布式Trainer，这让模型的训练过程变得更加高效。
我们实现了高效微调算法LoRA/Prefix Tuning，使得单机可以精调千亿级别的模型。
同时，我们还依托PaddleSlim的自研量化算法，在所有支持的大模型上全面实现了无损量化。

这些升级都是为了让我们的用户能在大模型时代中更加轻松地进行模型的训练、优化和部署。我们期待你的试用，并期待你的反馈，让我们一起推进PaddleNLP的发展。在2.5版本到2.6版本中PaddleNLP有 40 位新增Contributors，感谢大家对PaddleNLP开源工作的支持！

New Contributors

@zws-2019 made their first contribution in #5167
@qiuwenbogdut made their first contribution in #5098
@kuizhiqing made their first contribution in #5347
@46319943 made their first contribution in #5419
@jiaohuix made their first contribution in #5465
@kangguangli made their first contribution in #5438
@vivienfanghuagood made their first contribution in #5563
@zhiboniu made their first contribution in #5470
@cyber-pioneer made their first contribution in #5598
@invokerbyxv made their first contribution in #5622
@megemini made their first contribution in #5658
@zhenyun-li made their first contribution in #5683
@solrex made their first contribution in #5736
@nemonameless made their first contribution in #5487
@Yulv-git made their first contribution in #5709
@wangxinxin08 made their first contribution in #5773
@AlphaHinex made their first contribution in #5815
@houj04 made their first contribution in #5820
@Joker1718 made their first contribution in #5816
@pkuzyc made their first contribution in #5538
@jadepeng made their first contribution in #5841
@KB-Ding made their first contribution in #5886
@parap1uie-s made their first contribution in #5775
@zirui made their first contribution in #5866
@GOH-Gu made their first contribution in #5951
@yangjianfengo1 made their first contribution in #6069
@zhangting2020 made their first contribution in #5922
@rogerserper made their first contribution in #6192
@wtmlon made their first contribution in #6258
@qingzhong1 made their first contribution in #6251
@BeingGod made their first contribution in #6307
@zhiqiu made their first contribution in #6347
@DesmonDay made their first contribution in #6435
@cyk1337 made their first contribution in #6447
@lxp521125 made their first contribution in #6491
@littsk made their first contribution in #6425
@RachelXu7 made their first contribution in #6572
@wanghuancoder made their first contribution in #6539
@DrownFish19 made their first contribution in #6570
@GhostScreaming made their first contribution in #6673

Full Changelog: v2.5.2...v2.6.0

Contributors

AlphaHinex, solrex, and 38 other contributors

Assets 2

12 Jun 03:23

sijunhe

v2.6.0rc

8ddd9a8

PaddleNLP v2.6.0rc Pre-release

Pre-release

PaddleNLP v2.6.0rc 预览版

核心改动

全面支持主流开源大模型bloom, chatglm, glm, llama, opt的训练和推理范例
新增跨模态模型minigpt4, speecht5
Trainer新增模型并行能力，对于支持模型并行的模型可一键开启tensor parallel训练
新增低参数高效微调能力peft, 包括单卡和分布式的LoRA和Prefix Tuning策略，助力大模型应用落地
Pipelines新增实验性质的大模型应用和Agents功能，包括文档单多轮问答和基于ReACT的agents

Assets 2

07 Mar 08:43

sijunhe

v2.5.2

e40e40b

PaddleNLP v2.5.2

New Features

PPDiffusers

新增基于FastDeploy的CycleDiffusionPipeline和动态图版CycleDiffusionPipeline、增加动态图版的Gradio调用界面 #4945 #4830
更新LoRA，支持自定义lora_rank #4894 #4925
新增ControlNet、支持推理与训练 #5009 #5090
升级community目录下clip_guided_stable_diffusion, interpolate_stable_diffusion, lpw_stable_diffusion, stable_diffusion_mega #4920 #4947

AutoNLP

autonlp文本分类支持使用taskflow进行推理部署 #4896
支持文本分类模型finetune和prompt tune训练--评估-压缩-推理全流程#4967 #4963
支持visualdl和训练日志分发到每个trial #4990 #5021

基础底座

完成MegatronBERT, MobileBert, Reformer, Roformerv2, skep的transformers模型升级
新增14个BART中文模型 #4636
新增3个文本摘要Taskflow中文模型 #4933

FastGeneration

新增CodeGen-16B的示例 #4895
BART FastGeneration新增FusedAttention优化 #5111

Bug Fix

修复BART FastGeneration推理结果不正确的问题 #5111
修复UIE-M系列模型zero-shot抽取问题 #5108
修复DocParser图片读取及PDF缩放问题 #4975
修复CLIP和ChineseCLIP中的project dim，确保text_config与vision_config与之前一致 #5074
修复Trainer在Sharding Stage3时，GroupNorm与框架PyLayer API的Bug #4930

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

大模型精调对齐训推优化

模型新增

基础框架升级

问题修复

结构调整

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

大模型训推全流程

Unified Checkpoint

模型新增

基础框架升级

问题修复

New Contributors

Contributors

What's Changed

New Contributors

Contributors

PaddleNLP 2.6 正式版本：全新升级，迈进大模型时代！

New Contributors

Contributors

PaddleNLP v2.6.0rc 预览版

核心改动

New Features

PPDiffusers

AutoNLP

基础底座

FastGeneration

Bug Fix

Releases: PaddlePaddle/PaddleNLP

v3.0.0-beta0

大模型精调对齐训推优化

模型新增

基础框架升级

问题修复

结构调整

What's Changed

Contributors

v2.8.1

What's Changed

Contributors

v2.8.0

What's Changed

Contributors

v2.7.2

What's Changed

Contributors

v2.7.1

What's Changed

Contributors

PaddleNLP 2.7.0 Release Note

大模型训推全流程

Unified Checkpoint

模型新增

基础框架升级

问题修复

New Contributors

Contributors

v2.6.1

What's Changed

New Contributors

Contributors

v2.6.0

PaddleNLP 2.6 正式版本：全新升级，迈进大模型时代！

New Contributors

Contributors

PaddleNLP v2.6.0rc

PaddleNLP v2.6.0rc 预览版

核心改动

PaddleNLP v2.5.2

New Features

PPDiffusers

AutoNLP

基础底座

FastGeneration

Bug Fix