PaddleFormers v1.2

Latest

Latest

From00 released this 19 Jun 14:10

· 12 commits to release/1.2 since this release

d46aa49

What's Changed

[CI]add build scripts by @Liujie0926 in #2433
[CI]update bucket for daily build by @Liujie0926 in #2441
Add tensor parallelism on QLoRA by @tugang-baidu in #2424
Cherry-pick hybrid expert parallel sharding_metas by @pkuzyc in #2447
Solve DPO pin-memory problem by hacking HybridParallelOptimizer by @WYB27 in #2428
lazy more_elegant by @miao200years in #2451
support multi download source by @fjjF77 in #2427
Fix paddle.distributed.checkpoint path by @xingmingyyj in #2452
fix sentencepiece.bpe.model download by @fjjF77 in #2454
hf tokenizer adaptation by @SdeeRK in #2445
Support general design for modeling by @cheng221 in #2446
[CI]add ce yml by @Liujie0926 in #2468
Legacy tokenizer by @SdeeRK in #2465
Tokenizer update by @SdeeRK in #2457
fix vl lora uc. by @wtmlon in #2463
FIX: E_cpu by @miao200years in #2475
fix import download_utils & support ci set network proxy by @fjjF77 in #2477
[BugFix] fix lazy_import error of importlib.machinery by @EmmonsCurse in #2482
[feature] lazyimport-and-tokenizer by @miao200years in #2481
model download source add ernie by @fjjF77 in #2484
fix test_configuration download model ci bug by @fjjF77 in #2488
Support HF torch load & save by @llbdyiu66 in #2437
delete import distutils in pdc_utils by @cheng221 in #2493
[feature] testcase-without-paddle by @miao200years in #2494
Try import ckpt convert by @xingmingyyj in #2476
add sink_attention by @xxyux in #2461
fix uc tp name mapping. by @wtmlon in #2502
[CI] update trigger conditions by @Liujie0926 in #2499
[FIx_v0.2] PreTrainedTokenizer by @miao200years in #2498
support ci which download models from hugging face by @fjjF77 in #2490
enhance apply_chat_template by @SdeeRK in #2513
add train sft examples by @llbdyiu66 in #2491
remove bos download by @fjjF77 in #2517
[CI]Add codecov by @Liujie0926 in #2528
add decode_token function by @SdeeRK in #2519
fix ernie4_5vl tokenizer unitest for network connection error by @fjjF77 in #2529
change apply_chat_template add_generation_prompt param by @SdeeRK in #2533
[CE]fix ce scripts by @Liujie0926 in #2534
update v0.2 by @lugimzzz in #2540
[CI] Update image by @Liujie0926 in #2542
[BugFix] fix decode_token by @yuanlehome in #2544
move text data streams from Erniekit to PaddleFormers. by @Jonathans575 in #2453
[fix] ADDTOKEN by @miao200years in #2545
Add ernie4 5 moe by @cheng221 in #2520
fix mistakes change in tp by @cheng221 in #2550
[BufFix] fix decode_token by @yuanlehome in #2553
fix_decode_token by @yuanlehome in #2559
fix general pipeline model by @cheng221 in #2560
add test_tokenizer_decode_token by @SdeeRK in #2562
【model】add Gpt oss model support sft/lora and infer by @xiaoguoguo626807 in #2555
remove old dataset. by @Jonathans575 in #2561
fix chat_template bug. by @Jonathans575 in #2552
update examples, add dpo & lora training by @llbdyiu66 in #2563
Refactoring Qwen2/3 with general design by @Ace-To-HYB in #2480
【Fix Bug】fix startend_row_indices bug by @cheng221 in #2565
update v0.3 by @lugimzzz in #2567
add estimate max_steps by @Jonathans575 in #2566
fix pp_seg_method and unfiy training attention with attn_impl by @cheng221 in #2572
add estimate training in dpo training by @Jonathans575 in #2573
rename loss_subbatch_seqlen to loss_subbatch_sequence_length by @cheng221 in #2581
Fix attn impl and ernie4.5 for erniekit by @cheng221 in #2580
Add sft/dpo example json by @Ace-To-HYB in #2575
bugfix about model config by @Jonathans575 in #2577
fix pretrained_config save dtype by @cheng221 in #2587
[CI] update codecov by @Liujie0926 in #2588
padding to max len in batch instead of the default config in no packi… by @Jonathans575 in #2584
feat(dsv3): add dsv3 fast pretrain into paddleformers by @chen2016013 in #2594
[CI] mv codecov from cpu to gpu by @Liujie0926 in #2596
support Glm4Moe by @WYB27 in #2554
support Glm4Moe EP mode by @danleifeng in #2603
Glm4moe supports fusedqkv and fusedffn by @gongel in #2607
【BUG】fix save torch_dtype in trainer by @cheng221 in #2609
Remove train config by @Ace-To-HYB in #2598
[CI] remove unittest-cpu by @Liujie0926 in #2610
Update mergekit with hf lora model by @llbdyiu66 in #2585
[CI] update ci docker image build by @Liujie0926 in #2614
Update README and add download source by @Ace-To-HYB in #2616
[Fix Bug]Fix LM Head dimension mismatch issue by @Ace-To-HYB in #2622
Glm4Moe fix attn_impl && remove config by @WYB27 in #2624
[CI]fix docker build and add allure by @Liujie0926 in #2626
use key-value to init dataloader by @Waynezee in #2537
Fix fused func for Glm4MoeForCausalLMPipe by @DrownFish19 in #2619
bugfix: ernie dataset tests by @Jonathans575 in #2627
remove flashmask checker by @WYB27 in #2631
【GPT-OSS】update sliding_attention layer use flashmask by @xiaoguoguo626807 in #2606
[Bug] Fix precision of gate and e_score_correction_bias in Glm4Moe by @DrownFish19 in #2637
Add Qwen3 download source by @Ace-To-HYB in #2638
[CI] update pytest config by @Liujie0926 in #2620
examples update yaml training config by @llbdyiu66 in #2644
fix save tensor dtype by @llbdyiu66 in #2642
Fix the issue of loading ckpt when retraining by @Ace-To-HYB in #2645
API Update: Modify concat and equal usage by @Ace-To-HYB in #2652
modify aistudio download adr by @a31413510 in #2654
fix ernie4_5 moe_layer to allgather by @cheng221 in #2633
Chore/update scripts floder by @huanghengheng in #2660
modify test problem by @a31413510 in #2662
[BUG] 提升 GLM4 MOE Rope 的精度 by @JunnYu in #2663
Add sliding window and optimize redundant code by @Ace-To-HYB in #2655
Support safetensors.paddle without coverting to numpy by @llbdyiu66 in #2538
[Unified Checkpoint] Support deepep by @DesmonDay in #2623
【GLM】subbatch performance and weight bug fix by @danleifeng in #2661
Remove PP judgment for lora in Qwen2/Qwen3 by @Ace-To-HYB in #2669
add no template by @Jonathans575 in #2676
add thinking dataset support + bugfix by @Jonathans575 in #2672
Revert "Support safetensors.paddle without coverting to numpy (#2538)" by @llbdyiu66 in #2678
Fix bug of missing trainer attribute by @Ace-To-HYB in #2684
[Fix bug]Add generation module in init.py file by @fangfangssj in #2608
fix ernie4.5 moe bug by @cheng221 in #2689
fix transpose key by @cheng221 in #2693
Block-hf-download-in-ci by @huanghengheng in #2696
fix decode token by @yuanlehome in #2698
Qwen2/3 modeling alignment implementation by @Ace-To-HYB in #2694
Fix conflict bug in DPO by @Ace-To-HYB in #2705
Modularize Causal Mask by @Ace-To-HYB in #2707
Feat(validation): Add error logging for unsupported config in pp by @Ace-To-HYB in #2710
add thinking model dpo training by @Jonathans575 in #2709
Fix(logging): fix issue in pipeline parallelism error logging by @Ace-To-HYB in #2711
【GLM】fix ep callback and pp moe_subbatch_token_num by @danleifeng in #2687
Glm4moe fix tp+ep+sp by @WYB27 in #2621
Feat/model unittest ci action by @huanghengheng in #2683
add function call training docs. by @Jonathans575 in #2714
Update lora layer source by @emmanuel-ferdman in #2489
fix moe_subbatch_token_num conflict by @zjjlivein in #2719
Feat(examples/README): Update dataset acquisition instructions by @Ace-To-HYB in #2715
Support safetensors.paddle without coverting to numpy v2 by @llbdyiu66 in #2724
fix seq_aux_loss with sp && add gate weight allreduce callbacks by @deepllz in #2723
fix dpo training parallel by @llbdyiu66 in #2717
Fix(config): fix attribute error in PretrainedConfig by @Ace-To-HYB in #2731
fix callbacks by @deepllz in #2729
Support fleety_20250421 Paddle by @gongel in #2732
[dsv3]Move dsv3 model from paddlenlp-dsv3-sft by @Difers in #2593
subbatch cast logits to fp32 in dpo_loss by @cheng221 in #2716
Add gather_split_param arg for sharding stage1 v2 in unified checkpoint by @DesmonDay in #2734
Feat(ckpt): Optimize checkpoint loading and saving to support multimodal models by @Ace-To-HYB in #2725
add command line tools (version1 : only include training and export function) by @Jonathans575 in #2720
Add copy local weight files by @huanghengheng in #2733
【GLM4.5】fix glm tp+sp hang by @danleifeng in #2735
fix seq_aux_loss by @Difers in #2736
dpo train add tp+pp+sp by @llbdyiu66 in #2728
【gpt-oss】Add Fp4 to bf16 test by @xiaoguoguo626807 in #2712
fix ernie4.5 dpo in formers by @cheng221 in #2743
【GLM】fix glm sp bug by @danleifeng in #2746
bugfix(cli): ernie sft train by @Jonathans575 in #2740
bugfix(cli dpo) by @Jonathans575 in #2748
fix load state_dict bug by @llbdyiu66 in #2751
padding to max_seq_len when packing or enable sp by @Jonathans575 in #2749
【gpt-oss】add weight change readme by @xiaoguoguo626807 in #2753
Fix(qwenmoe): Fix SP issue in Qwen Moe by @Ace-To-HYB in #2741
fix CE bug by @chen2016013 in #2745
Fix norm sp by @cheng221 in #2755
【GPT-oss】Support sequence parallel by @xiaoguoguo626807 in #2730
Fix(ReadME): Add evaluation mode to example by @Ace-To-HYB in #2759
Fix(optimizer): Fix import issue in moe_hybrid_parallel_optimizer by @Ace-To-HYB in #2763
【gpt-oss bf16 to fp4】fix 3.2.1 paddle by @xiaoguoguo626807 in #2764
Revert "Support safetensors.paddle without coverting to numpy v2 (#27… by @llbdyiu66 in #2756
[Fix] TP-SP时, sharding stage1 or dp采用callback方法, 其他情况采用hook方法同步sp的梯度 by @JunnYu in #2765
Feat/add tp pp fc by @huanghengheng in #2768
Fix load_hf_ckpt in paddleformers by @chen2016013 in #2744
fix merge model dtype bf16 by @llbdyiu66 in #2773
Fix dsv3 pretrain to runnable state by @chen2016013 in #2776
raise error in LMHead when vocab_size cannot divide by tp degree by @cheng221 in #2779
fix fused_head_and_loss_fn bug by @danleifeng in #2774
Glm4Moe: add Aux Loss by @WYB27 in #2784
fix merged_gqa_qkv tp splits by @llbdyiu66 in #2783
Synchronize pr 11095(PaddleNLP) by @chen2016013 in #2786
Synchronize pr 11097(PaddleNLP) by @chen2016013 in #2778
[AutoParallel] migrate code from paddlenlp by @xuxinyi389 in #2795
ci-upload-coverage-pending-bug by @huanghengheng in #2772
Adapt to non-orthogonal context parallel by @umiswing in #2796
[FEA] PP prepare_inputs add autocast by @umiswing in #2797
[CherryPick] Add nccl_config for Paddle by @Waynezee in #2794
cherry-pick: PaddlePaddle/PaddleNLP#10896 by @ZHUI in #2793
add paddleformers dataset debug by @Jonathans575 in #2766
modify use_quick_lora to False by @Jonathans575 in #2767
LoRA: add lora target modules support for qwen2 and qwen3 dense model by @aiyinyuedejustin in #2804
Synchronize pr 11031(PaddleNLP) by @chen2016013 in #2782
Synchronize pr 11033(PaddleNLP) by @chen2016013 in #2780
fix warm start loss bug by @chen2016013 in #2785
[Auto-Parallel] add llama auto_parallel by @Xing-lil in #2791
fix xpu not support bf16 by @llbdyiu66 in #2806
Upgrade swanlab code by @JunnYu in #2810
[PaddleNLP->PaddleFormers] Remove restrictions on the use of allgather_overlap #10741 by @miao200years in #2811
fit xpu not support bf16 by @llbdyiu66 in #2809
fix reorder_pipeline_priority bug by @chen2016013 in #2807
add pretrain dataset by @Jonathans575 in #2803
feat/Add-dop-test-model-resume by @huanghengheng in #2802
[[PaddleNLP->PaddleFormers] Support param and moment sync for pp && refine training_pipeline_step to avoid memory leaks by @zhangyuqin1998 in https://github.com//pull/2816
Support PF_HOME through os env by @zjjlivein in #2798
【Pretrain】fix pipe parallel bug when Pretrain by @xiaoguoguo626807 in #2819
bugfix: only the last round of dialogue is involved in the calculatio… by @Jonathans575 in #2691
fix cli config by @Jonathans575 in #2820
pretrain bugfix + add demo data by @Jonathans575 in #2824
[Patch-Sync] cherry-pick useful PRs from paddlenlp by @SylarTiaNII in #2827
Cherry-pick some PRs from PaddleNLP by @sneaxiy in #2821
Make sharding_first by default by @sneaxiy in #2822
fix run_finetune config problem by @Jonathans575 in #2831
delete cat by @huanghengheng in #2828
【gpt-oss】change less weight by @xiaoguoguo626807 in #2834
[Unified MoE Layer]: Add MoE Layer with DeepEP EP Support; Add Qwen3MoE EP by @hushenwei2000 in #2702
fix cross_entropy_with_softmax bug by @chen2016013 in #2836
Fix-build-image-bug by @huanghengheng in #2838
Add more args to Qwen2ForCausalLM by @Jason233333 in #2839
Fix the bug when creating position_ids by @pkuzyc in #2843
Trans ernie by @FeixLiu in #2833
add ckpt convert for dsv3 by @Difers in #2845
fix moe_gate by @Difers in #2844
docs change by @w-yyh in #2818
GLM4Moe: add dpo flashmask by @WYB27 in #2850
training dpo remove flashmask check by @llbdyiu66 in #2854
[Unified MoE Layer]: Add MoE AllToAll EP by @hushenwei2000 in #2849
[Unified MoE Layer]: Support GLM4.5 by @hushenwei2000 in #2842
Fix DPO NONE_CHAT_TEMPLATE by @WYB27 in #2860
Fix the non-convergence in DSV3 post-pretrain by @chen2016013 in #2856
revert lora_target model by @zjjlivein in #2853
[DSV3]add native moe & refine codes by @Difers in #2775
Glm4Moe: add unittest by @WYB27 in #2737
[DSV3]Fix embbeding by @Difers in #2863
Feat(processor): Refactor AutoProcessor for multi-modal (Image/Video) support by @Ace-To-HYB in #2750
ernie support flex_checkpoint by @blacksheep-Aristotle in #2862
Fix(DPO): Resolve DPO training loss diff in Pipeline Parallelism mode by @Ace-To-HYB in #2872
add ernie4_5_moe_vl by @BossPi in #2870
fix export in cli by @Jonathans575 in #2873
【FC】fix pretrain for resume_from_flexcheckpoint by @xiaoguoguo626807 in #2874
fix:the incorrect saving of '_attn_implementation' by @w-yyh in #2866
Feat/add pt by @huanghengheng in #2865
Update moe model save for moe_sharding by @DrownFish19 in #2846
Revert "fix:the incorrect saving of '_attn_implementation'" by @huanghengheng in #2886
Revert "add ernie4_5_moe_vl" by @huanghengheng in #2883
Docs(processor): Add some Introduction for Processor by @Ace-To-HYB in #2879
Fix(CI): Delete some CI Package by @Ace-To-HYB in #2887
Fix ZCC EMA GPU alloc bug by @sneaxiy in #2880
adapter flex_checkpoint by @xingmingyyj in #2460
save lora using fc and add signal by @changeyoung98 in #2867
A temporary solution for the 'hack_unload_optimizer' feature by @miao200years in #2888
Fix(processor): Remove unnecessary testing code in ProcessorMixin by @Ace-To-HYB in #2889
Add some performence opt flags for dsv3 pretrain and support cli by @zhangbo9674 in #2877
fix fp8 dtype_byte_size support by @llbdyiu66 in #2895
Fix PipelineDatasetPreprocessor bug with dualpipev by @zhangbo9674 in #2893
【FlexCheckpoint】Fix fc adaptation logic and support qwen3moe by @xingmingyyj in #2892
add tiny-random-glm4moe ci by @huanghengheng in #2855
glm 4.5 air开启lora时, 绕过gate的被冻结梯度防止hang-2 by @aiyinyuedejustin in #2896
add ernie4_5_moe_vl by @BossPi in #2890
Feat/add tiny random glm4moe by @huanghengheng in #2909
【gpt-oss】fix gpt-oss recompute error by @xiaoguoguo626807 in #2894
Fix some comment and add flex_token for deepseekv3 pretrain moe_layer by @zhangbo9674 in #2913
fix export tokenizer.json bug by @WYB27 in #2882
Support glm lora resume using FC by @changeyoung98 in #2911
[Unified MoE Layer] Fix _cal_seq_aux_loss Function by @hushenwei2000 in #2901
Feat/add tiny random glm4moe by @huanghengheng in #2915
Fix(test): Set tmpdir for save_pretrained in unittest by @Ace-To-HYB in #2916
modify the datasets import by @Jonathans575 in #2917
[DSV3]add dsv3-sft test case by @Difers in #2891
fix GLM subbatch in lora mode and fix SFT&DPO callbacks by @danleifeng in #2840
fix group_mask by @Difers in #2919
support hybrid_parallel_expert_grad_scale by @AlAuAu in #2900
【FlexCheckpoint】fix param attr bug by @xingmingyyj in #2920
Add Non ZCC EMA callback by @sneaxiy in #2923
update version v0.4 by @lugimzzz in #2937
add training speed indicator by @llbdyiu66 in #2936
[fea] support dp-moe for zcc and global_expert_id by @FelliYang in #2812
[AutoParalle] Refactor trainer to support auto-parallel with intermediate_api by @waliwali777 in #2801
Fix(Qwen & SFT): Update Qwen3Moe tiny model(v2) & fix SFT PP consistency by @Ace-To-HYB in #2904
fix unsavable key '_attn_implementation' and change the CI EXPECTED_RESULT by @w-yyh in #2918
chunk offload optimizer for paddleformers by @Wennie396 in #2898
Add DPO function-call dataset and change part of full_funciton_call.yaml by @w-yyh in #2857
[Auto-Parallel] adapt flex_checkpoint save/load by @Xing-lil in #2869
fix load hf ckpt core dump by @chen2016013 in #2943
fix loss mask bug in dataflow when using no template by @Jonathans575 in #2947
[AutoParallel] fix trainer offload opt params bug by @waliwali777 in #2944
fix dmodel download proxy by @Jonathans575 in #2952
Fix FC lora resume loss by @changeyoung98 in #2950
增加新的数据流说明文档 by @Jonathans575 in #2959
Feat(image_processor): Support locally registered image processors by @Ace-To-HYB in #2960
add log + truncate long data by @Jonathans575 in #2830
Fix-update-md-no-ci by @huanghengheng in #2962
Add example glm45 by paddlefleet by @From00 in #2957
Fix EMA bug when load different strategies by @sneaxiy in #2963
Fix(processor): Prevent MRO error by checking for existing image processor class & Skip hanging test case by @Ace-To-HYB in #2964
fix cli batch_size by @llbdyiu66 in #2951
[use cli]remove training scripts by @llbdyiu66 in #2969
fix codecov without base report by @zjjlivein in #2942
[CI] check requirements change by @zjjlivein in #2848
kl-cpt dataflow. by @wtmlon in #2742
add ep config by @zjjlivein in #2977
Feat/add ep config by @huanghengheng in #2975
fix xpu merge lora with uint16 by @llbdyiu66 in #2976
fix:is_causal bug by @w-yyh in #2932
add the is_casual_mask by @w-yyh in #2983
add glm fuse qkv params by @zjjlivein in #2982
Enable FC for paddlefleet by @changeyoung98 in #2988
Support GLM4.5 in PaddleFleet by @xuxinyi389 in #2991
[Llama] Refactor Llama by @LittleHeroZZZX in #2770
modify config of glm by @xuxinyi389 in #2995
[FIX] replace imported function with environment variable check by @LittleHeroZZZX in #2999
GLM4Moe: fix qk_norm by @WYB27 in #2998
EB4.5 supports SFT dataflow by @lshpku in #2978
pretrain dataflow add truncate_packing and use_global_causal_attn by @Jonathans575 in #2980
Fix(processor): Resolve AutoProcessor Error when loading tokenizer by @Ace-To-HYB in #3000
Feat(models): Support Qwen2.5-VL model by @Ace-To-HYB in #2965
dynamicCache first try by @w-yyh in #2961
Add Qwen3-Moe fuse qkv/ffn modeling by @llbdyiu66 in #2981
[Phi4]Add phi4 model by @zhanghonggeng in #2790
add fc into zcc by @liufengwei0103 in #2971
fix single card run in paddlefleet example by @huangjiyi in #2996
unify moe_param grad scale by @AlAuAu in #3012
revert _wrap_model_and_load_sharded_checkpoint by @Xing-lil in #3016
【gpt-oss】add fc aoa by @xiaoguoguo626807 in #3006
【Gemma3】add gemma3(text) by @lijialin03 in #2817
attention alignment by @cjw-d in #2966
fix transformers infer by @BossPi in #2984
Fix(processor): Resolve tokenizer loading issue by @Ace-To-HYB in #3017
Rope reproduction by @cjw-d in #2945
Refactor: Centralize and Extend update_model_kwargs_for_generation Logic by @w-yyh in #2972
[Llama3] Fix: add pad token fallback and improve tensor reshaping by @LittleHeroZZZX in #3025
rename config to align paddlefleet by @Hz188 in #3022
fix/repair-ci by @huanghengheng in #3043
apply MAPPING_SPACIAL_KEY to NAME_MAPPING by @lijialin03 in #3018
fix/update ernie4_5 apply_fused_rope by @cjw-d in #3044
【Gemma3】adapting to fuse_attention & del _get_tensor_parallel_mappings & update unittest by @lijialin03 in #3039
Fix(image_processor): Remove download code in image processor by @Ace-To-HYB in #3029
add-dpo-fc by @huanghengheng in #2985
【GPT-OSS】sink_flahmask with packing=False raise error by @xiaoguoguo626807 in #3019
[Llama3] Fix fc when lm_head not exists by @LittleHeroZZZX in #3053
Add global param to refined_recompute by @DongBaiYue in #2899
fix 4.5_vl xpu support by @DongBaiYue in #2968
【FlexCheckpoint】fix save hf bug by @xingmingyyj in #3030
Add glm single card by @Waynezee in #3047
adapt fc to sharding stage3 by @zty-king in #2987
cover newly added models and improve tensor reshaping by @cjw-d in #3026
Test(qwenvl): Add unittest for vision_process in QwenVL by @Ace-To-HYB in #3058
[cli]dpo add continue_training by @llbdyiu66 in #3061
merge lora support fuse qkv/ffn by @llbdyiu66 in #3038
[DSV3]添加文档 by @Difers in #2777
check release pr by @zjjlivein in #3050
add qwen single card test for paddlefleet by @huangjiyi in #3059
【Gptoss】 paddle fav3 update param , fix error by @xiaoguoguo626807 in #3066
wa save hf funciton by @liufengwei0103 in #3063
fix codecov report do not upload by @zjjlivein in #3052
align the return value due to DynamicCache by @w-yyh in #3031
Fix(qwen2.5vl): Fix Qwen2.5-VL unitttest and update baseline by @Ace-To-HYB in #3046
Decoupled Causal and Sliding Attention Mask Generation and Added Custom Mask Overlays. by @w-yyh in #2994
refactor of update_model_kwargs_for_generation: for new model by @w-yyh in #3034
Fix: enable GQA in scaled dot-product attention by @LittleHeroZZZX in #3057
Fix(qwen2.5vl): Fix Qwen2.5-VL recompute argument errors by @Ace-To-HYB in #3023
add by @w-yyh in #3033
Add sec token print by @llbdyiu66 in #3008
fix deterministic by @Waynezee in #3056
Add Qwen3-Next model by @lshpku in #2754
update codecov fail ratio by @zjjlivein in #3072
Ernie processor by @BossPi in #3037
【Gemma3】update CI test by @lijialin03 in #3081
Config for GLM45 sequence parallel by @pkuzyc in #2986
fix single card run by @huangjiyi in #3083
[加载模型]修复fused +pp开启时,因key(q /kv)不在同一safetensor时被跳过并随机初始化的bug by @aiyinyuedejustin in #3067
update RotaryEmbedding for multiple models by @cjw-d in #3076
Update requirements.txt by @a31413510 in #3040
align config by @Waynezee in #3090
add paddlefleet by @swgu98 in #3077
新数据流——支持更多数据格式、自定义chat template、支持多模等 by @Jonathans575 in #3080
modify reshape to view by @cjw-d in #3088
修复开启lora时recompute报错pp_model by @aiyinyuedejustin in #3093
Change fuse_rms_norm api in deepseek-v3 pretrain by @zhangbo9674 in #3086
Ernievl aoa by @BossPi in #3097
dataflow: Adaptively read json and jsonl by @Jonathans575 in #3106
merge save hf and save full by @liufengwei0103 in #3073
FleetModel Dpo, AutoModel => FleetModel. by @wtmlon in #3024
Fix key name error of fc in zcc by @liufengwei0103 in #3095
default initialize paddlefleet by @huangjiyi in #3100
add fuse attn qkv/ffn config by @llbdyiu66 in #2979
[DSV3]remove dsv2 folder by @Difers in #3027
add&clean fused qkv/ffn configs by @llbdyiu66 in #3117
Update README.md by @swgu98 in #3111
fix save load bf16 sharding opt in fc zcc by @liufengwei0103 in #3121
【fleet】add Set random seed in workflow by @xiaoguoguo626807 in #3003
revert Uc because fc hang when FLAGS_cudnn_deterministic=1 by @xiaoguoguo626807 in #3118
Support fleet save_pretrained from_pretrained by @changeyoung98 in #3109
fix paddlefleet config by @Waynezee in #3098
[Feat] Support PaddleOCR-VL model by @forBlank in #2974
add vl model sft yaml by @Jonathans575 in #3110
refactor by @w-yyh in #3102
Fix gpt_provider import in examples/experiments/paddlefleet by @ooooo-create in #3126
Unified Config by @xuxinyi389 in #3119
add cp tools for fleet cp by @Wennie396 in #3123
fix get local_rank bug in save hf by @liufengwei0103 in #3124
Resolve the issue of parameter accuracy in dev branches by @miao200years in #3108
glm45 suport pipeline parallel by @LiYuRio in #3082
fix save hf step default by @xingmingyyj in #3131
【fleet】fix Fleet lora model by @xiaoguoguo626807 in #2997
fix qwen2 pp model with aoa by @llbdyiu66 in #3127
align rope by @cjw-d in #3101
预训练在线+离线数据流添加attn mask传入 by @Jonathans575 in #3137
【Gemma3】Update CI:add test_dtype and fix no attention test by @lijialin03 in #3112
update rope in paddleocr by @cjw-d in #3146
【lora】delete nouse llm target_modules by @xiaoguoguo626807 in #3125
Update CI by @swgu98 in #3114
fix scale bug by @Wennie396 in #3141
remove requirements-dev.txt by @zjjlivein in #3147
add glm4.5 yaml by @risemeup1 in #3129
add mask for run_pretrain.py data by @Wennie396 in #3152
Feat(VL Training)Support Qwen2.5-VL training and freeze MLLM modules by @Ace-To-HYB in #3153
[AutoParallel] Refactor llama3.1 model in intermediate api by @sevenan2 in #3116
Update fleet version by @swgu98 in #3154
REMOVE unused code by @BossPi in #3079
【Gpt oss】update flashv2 by @xiaoguoguo626807 in #3145
hard code for glm provider by @BossPi in #3163
Cherry-pick timers and CPU memory logs by @sneaxiy in #3162
align pp rope dtype by @llbdyiu66 in #3155
unify arguments:tp pp vp cp ep moe_subbatch_token_num by @Feiye0979 in #3160
Update glm provider transform_rules by @Feiye0979 in #3172
[DSV3]Fix some test case by @Difers in #3148
Add qwen30 b benchmark by @risemeup1 in #3175
【fleet 】fleet update pp_model, fix some bug by @xiaoguoguo626807 in #3165
disable test_lorapro by @zjjlivein in #3183
Fix for pr #3130 by @cjw-d in #3174
remove bert && redundant code by @a31413510 in #3138
glm45 support pipeline parallel by @LiYuRio in #3158
ernie45/moe support fc and fused qkv/ffn by @llbdyiu66 in #3140
【fleet】fix pp>1 some bug by @xiaoguoguo626807 in #3177
adapt fp8 offline quant by @Waynezee in #3151
[Unified MoE] Set Expert fuse_up_gate Parameter for All Model by @hushenwei2000 in #3191
fix missing use_cache in model_kwargs by @cjw-d in #3185
Migrate FlexCheckpoint functionality from paddlenlp to PaddleFormers by @xingmingyyj in #3193
qwen3-0.6b数据流对齐swift by @Jonathans575 in #3149
Unification of device APIs by @fxyfxy777 in #3195
change unified fuse params by @llbdyiu66 in #3103
add saved_signal by @liufengwei0103 in #3202
disable only sharding opt in saving stage by @liufengwei0103 in #3201
rm ops && qwen && trl by @a31413510 in #3187
dataflow modify with deepseek, glm ... by @Jonathans575 in #3212
[Llama3] Fix sft function call CUDA ERROR 700 by @LittleHeroZZZX in #3184
decouple sharding_io and non_zcc ema by @liufengwei0103 in #3208
support profile by @xuxinyi389 in #3220
fix glm pp vpp configuration by @LiYuRio in #3219
【fleet】fix lora_A not merge by @xiaoguoguo626807 in #3216
change tiny-random-qwen3 aistudio addr by @a31413510 in #3218
persistent dataloader workers when workers > 0 by @xuxinyi389 in #3217
Fix(Qwen): fix Qwen2/3 MoE when deterministic by @Ace-To-HYB in #3214
rm blob requirement by @a31413510 in #3222
fleet args integration. by @wtmlon in #3173
fix load ema and model_meta path by @xingmingyyj in #3225
change test_load_from_hf model by @a31413510 in #3229
modify warn to warning by @cjw-d in #3237
fix ernie45_moe lora training modules by @llbdyiu66 in #3230
Fix(trainer): Remove redundant tokenizer saving and support processor saving by @Ace-To-HYB in #3236
fix bug for kvcache by @w-yyh in #3215
[Qwen3MoE] support subbatch by @cjw-d in #3240
[Dataflow] Support PaddleOCR-VL Template and MMPlugin by @forBlank in #3164
fix cpt bug. by @wtmlon in #3242
fix fleet args by @Waynezee in #3244
纯文对齐swift+修复function call+增加grounding能力 by @Jonathans575 in #3241
Feat(Qwen2.5-VL): Add fuse qkv/ffn config and fix some issue for Qwen2.5-VL by @Ace-To-HYB in #3248
Ernie datastream by @BossPi in #3253
Remove shift one in PaddleOCR-VL by @forBlank in #3255
fix bug of bf16 opt by @liufengwei0103 in #3256
Fix dataloader by @xuxinyi389 in #3249
Bug fix: process vision data by @BossPi in #3260
For compatibility with old LSE shape (seqlen_q_rounded) for FA2 on A GPU by @GuoxiaWang in #3259
Fix(model): Fix vllm ckpt loading and dtype error by @Ace-To-HYB in #3262
[CI] fix codestyle by @zjjlivein in #3267
[Fleet] Disable Default moe_grouped_gemm by @hushenwei2000 in #3247
fix xpu offload device wrong by @jianingyu-ustc in #3180
fix model is_fleet by @Wennie396 in #3252
adapt aoa reverse by @zty-king in #3055
Fix(qwen3moe): Fix argument error in Qwen3Moe recompute by @Ace-To-HYB in #3269
[Fleet GLM 4.5]Fix lm_head aoa by @changeyoung98 in #3250
add cli fleet_args by @Waynezee in #3272
rename pp empty layer config by @Hz188 in #3273
[Bug Fix] 修复多模数据中没有视频时报错的问题 by @BossPi in #3271
[dataflow] fix glm template and mm plugin by @Jonathans575 in #3264
Fix(CI): Fix sub_config comparison logic in save/load case by @Ace-To-HYB in #3278
fix transformer import NameError by @zjjlivein in #3284
update config name from paddle fleet by @Hz188 in #3282
fix fc load in sft sharding stage3 by @Xing-lil in #3281
Fix sharding3 save freeze_param by @Xing-lil in #3270
rm unused cli hparams by @a31413510 in #3290
fleet args update. by @Feiye0979 in #3261
set minimum value of reader_buffer_size by @xuxinyi389 in #3285
[bug fix] change default None value to 0 by @Hz188 in #3292
Fix(processor): Fix video metadata type conversion conflict by @Ace-To-HYB in #3301
fix(llama): align to other model implementation, use dynamic shape in attention reshape by @jackyYang6 in #3289
【FlexCheckpoint】Support online save ema by @xingmingyyj in #3275
Revert "fix transformer import NameError" by @ooooo-create in #3299
[CI]add fleet ci by @tianlef in #3171
fix pretrain truncate packing by @Jonathans575 in #3303
align recompute by @Waynezee in #3283
[CI] update transformers by @zjjlivein in #3296
add_paddlefleet_qwen3moe by @huangjiyi in #3295
Support PaddleOCR-VL LoRA target_modules by @forBlank in #3277
Revert PR #3056 && PR #3214 by @Ace-To-HYB in #3305
【Fleet】update aoa config to support num_remove_layers in ppmodel by @xiaoguoguo626807 in #3276
[CI] fix nvidia runtime by @zjjlivein in #3316
[Config] Refactoring MoE Configs by @hushenwei2000 in #3291
fix docs by @xuxinyi389 in #3300
[dev#Feature] Only the environment Paddlefleet Available can call Paddlefleet by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3315
[Unified MoE Layer] Add moe_deep_gemm Config by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3318
add fleet yaml by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3323
[CI]fix fleet ci uv time by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3328
add chat_template.jinja to copy_file_list in run_export function by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3310
fix get_rope_index bug by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3319
Feat(vision): Update Resize API and add paddlecodec video backend by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3075
[Config] Refactoring MoE Configs (Part2) by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3329
change path of "check_loss" file by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3330
fix qwen2/3-moe gate dtype by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/3226
update_benchmark_qwen_yaml by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/3338
fix online save ema state by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3312
[Feat] Add Qwen3-vl by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3190
Update GLM4.5-Air.yaml for brenchmark by @ooooo-create in https://github.com/PaddlePaddle/PaddleFormers/pull/3337
not use bf16 opt when use lora by @liufengwei0103 in https://github.com/PaddlePaddle/PaddleFormers/pull/3340
Fix(tokenizer): Resolve download and import errors for PreTrainedTokenizer by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3346
fix dpo case by @Difers in https://github.com/PaddlePaddle/PaddleFormers/pull/3221
fleet pp model dpo by @wtmlon in https://github.com/PaddlePaddle/PaddleFormers/pull/3188
Fix CI loss for fleet by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3356
block torch dev by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3354
[Fleet CI]add qwen3 cli by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3347
[CI] glm pt fp8 cli by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3357
[Bug fix] Training Ernie4.5-VL by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3343
Fix(processor): Resolve video processor error when fallback to CPU by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3350
Test(qwen25vl): Update unit test baseline and training yaml by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3353
Fix(args): Reslove backend conflict between PreTrainedTokenizerFast and video loader by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3355
fix ce. by @wtmlon in https://github.com/PaddlePaddle/PaddleFormers/pull/3362
PP_Model: fix recompute in no_grad bug by @WYB27 in https://github.com/PaddlePaddle/PaddleFormers/pull/3326
fix torchvision by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3368
rm some try import by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3360
Feat(ckpt): Optimize checkpoint loading and saving to support multimodal models(LoRA training) by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3380
Qwen3Next supports flex_checkpoint by @lshpku in https://github.com/PaddlePaddle/PaddleFormers/pull/3370
open TopKRouter by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/3373
[CI] add qwen multi card by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3366
Update setup.py rm cython by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3379
Fix dataloader when dataloader_num_workers > 0 by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/3374
[CI] update precision for qwen by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3383
Fix(processor): Decouple transformers version via local AutoConfig by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3377
[Fleet CI]end-to-end-pipeline by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3336
[MoE Layer] Add EP Communication Barrier by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3371
update sharding save convert2cpu=True by @Xing-lil in https://github.com/PaddlePaddle/PaddleFormers/pull/3344
[CI] Add grouped gemm Intergrated Test by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3384
[Unified MoE Layer] Fix EP Hang when No Tokens are Distributed by the Rank by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3010
add trans_paddle2torch script by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/3358
add fa_version arg by @Wennie396 in https://github.com/PaddlePaddle/PaddleFormers/pull/3325
Support PaddleOCR-VL freeze_config setting & Update unit test model and baseline by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3394
get partial_rotary_factor from config by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3367
[CI]update timeout by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3361
unify forward function of rotary embedding by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3395
support saving tokenizer and processor by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3331
fuse_linear by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3385
Fix(bug): Resolve some trainer and paddlecodec error by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3410
fix: deepseek-v3 dpo pp by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3401
[Feat]Add Qwen3VLMoe by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3324
fix(masking_utils): bypass mask generation for non-eager attention implementations by @jackyYang6 in https://github.com/PaddlePaddle/PaddleFormers/pull/3391
[DSV3]fix some error by @Difers in https://github.com/PaddlePaddle/PaddleFormers/pull/3399
Change the PEFT folder to Lazy Import format by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3413
Update workflow.py by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3386
fix fa_version bug by @Wennie396 in https://github.com/PaddlePaddle/PaddleFormers/pull/3409
support Non-Zcc save EMA and fix scaling hang by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3396
[Fleet] fix lora bug by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3416
[Fix]Qwen3vl dense by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3417
[PaddleFleet]Add aoa for group_gemm by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3425
Ci/fleet precsion by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3419
[dataflow] fix the distributed dataloader bug. by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3421
[Fix]fix rope of qwen3vlmoe by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3414
fix automodel by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3113
fix save done signal pos by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3308
【CI】update precision approval by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3429
Paddlefleet adaptation @dev#feature by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3430
[Fix]fix for lora of qwen3vldense by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3434
[dataflow] fix system info in sft and dpo by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3441
add recompute check by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3388
Fleet aoa by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3428
add benchmark yaml by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3442
update qwen config by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/3435
【lora】modify stop_gradient for fleet model by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3445
[CI] update precision method by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3444
fix cp data padding by @Wennie396 in https://github.com/PaddlePaddle/PaddleFormers/pull/3440
[Fleet CI]add glm45 ci lora/dpo end to end by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3426
fix qwen grouped_gemm aoa by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/3438
lora model supports merge state_dict by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3427
no need to save tp size in config by @lugimzzz in https://github.com/PaddlePaddle/PaddleFormers/pull/3455
[Unified MoE Layer] Fix Router topk_weigtht in noaux_tc Method by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3452
Docs(proxy): Update proxy settings in README by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3456
fix sft function call eval bug by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3457
[CI] Turn on flex checkpoint by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3375
fix yaml by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3460
[dev] #Adapt to erniebot by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3466
[PaddleFleet] Remove moe_deep_gemm configuration option by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3443
[Fleet]support optim offload for FC by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3458
Fix(processor): Add validation in download cache lookup by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3469
[dataflow] fix default template backend by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3459
Add extra require by @risemeup1 in https://github.com/PaddlePaddle/PaddleFormers/pull/3464
fix no lazy import by @lugimzzz in https://github.com/PaddlePaddle/PaddleFormers/pull/3471
fit lora load&save with fc by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/3436
[Bug Fix]replace dtype to torch_dtype & remove head_dim from ernie_vl by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3448
Feat(processor): Support for legacy_serialization to save components separately by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3437
[dev] fix_erniebot by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3476
[Fix] Add cuda availability check for flash attention configuration by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3484
Feat(lora): Support LoRA for ERNIE4.5-VL by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3483
[dev] peft fix by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3486
modify paddleformers installation by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3478
[CI] Temporarily add Paddle link by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3489
[fix (sft)]: sync recompute config to vision model by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3490
add trl/llm_utils by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3474
Update README.md modify install by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3491
【lora】fleet model with Lora can only support origin expert compute, can't use fused_moe by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3450
delete cli moe_use_fused_node by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3496
[dataflow] fix ernie thinking template by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3480
Feat(lora): Support FusedLinear layer in LoRA by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3488
Fix(qwenvl): Fix recompute args for qwenvl by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3494
modify get_linear_type by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3497
[Refactor] Remove explicit Pipe model checks in lora utils by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3493
fix default environment variable in cli by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3500
[CI] fix requirements approved by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3473
[PaddleFleet] Add moe_ep_barrier config by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3498
[CI] Add timeout & update precision shell by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3515
Revert "delete cli moe_use_fused_node" by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3512
update_qwen_benchmark_config by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/3508
[CI] fix_test_save_load by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3514
Fix hf fc default value by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/3507
[Bug fix] replace dtype to torch_dtype by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3479
【fleet 】fix dpo lora by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3522
update benchmark yaml by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3529
【fleet】Fallback lora by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3526
fix dtype in from_pretrained in aoa by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/3528
rm triton by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3516
fix no template bug by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3521
XPU支持ERNIE-4.5-21B-A3B by @DongBaiYue in https://github.com/PaddlePaddle/PaddleFormers/pull/3481
fix CUDA_PATH Flag by @Zeref996 in https://github.com/PaddlePaddle/PaddleFormers/pull/3523
[Fleet]Fix num_experts for qwen by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3527
Set default args of amp_master_grad, amp_custom_black/white_list, sharding stage1 v2 by @WYB27 in https://github.com/PaddlePaddle/PaddleFormers/pull/3258
Standardize config type to Optional with None default by @zhangbo9674 in https://github.com/PaddlePaddle/PaddleFormers/pull/3524
[Fix]fix for stage3+lora resume Qwen3vl by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3532
Add fsdp yaml by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3517
[Feat]Lora for Qwen3vl moe by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3506
Add flags use cuda managed memory by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3560
fleet-release by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3539
[Bug Fix] Fix all_gather error when training ERNIE-VL by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3531
[Bug fix] fix position ids by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3537
[fix] support FC with torch_dtype for paddleocr_vl by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3533
deprecate fuse_linear by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3534
Fix(ckpt): Fix checkpoint saving to support multimodal models(LoRA training) && json saving missing by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3541
Fix(ci): Update repo_id for Qwen2 unittest case by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3509
[Fleet]Enable eval for fleet by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3535
[bug fix]修复笔误 by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3562
rm paddleformers.trl by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3543
[Fix]Qwen3vl dense&moe CI by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3499
[CI] fix precision approval by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3570
Adapter SonicMoE by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3565
align recompute by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3552
remove old tensor_parallel_mapping by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/3475
Upgrade EB4.5 pretrain to Paddle 3.3.0 by @lshpku in https://github.com/PaddlePaddle/PaddleFormers/pull/3525
support qwen3moe dpo. by @wtmlon in https://github.com/PaddlePaddle/PaddleFormers/pull/3566
[dev] #paddlefleet&&paddlepaddle version by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3545
[dataflow] fix the padding len by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3504
config add tie_word_embeddings by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3571
[Fleet] add glm45-air 64k benchmark yaml by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/3578
add qwen 64k pt&sft benchmark yaml by @Wennie396 in https://github.com/PaddlePaddle/PaddleFormers/pull/3567
fit no use fc args by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/3558
update benchmark yaml by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3561
[CI] add branch release by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3579
[dev][fix] args by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3574
fix(dpo): correct response_index and replace use_sparse_head_and_loss_fn by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/3553
add ernie45 pretrain to cli by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3563
fix safetensors file name by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3573
同步 Qwen3VL position_id计算 by @qhpeklh5959 in https://github.com/PaddlePaddle/PaddleFormers/pull/3513
Change Qwen3MoE/Glm4MoE Model to PaddleFleet Implementation by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3580
xpu support deepseekv3 sft by @ZhangX-21 in https://github.com/PaddlePaddle/PaddleFormers/pull/3556
update readme by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3582
Update readme for ERNIE4.5VL by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3585
[Iluvatar_gpu] ERNIE 21B/0.3B SFT by @YqGe585 in https://github.com/PaddlePaddle/PaddleFormers/pull/3586
[fix]fix for stage by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3584
[dev] Compatibility scheme for importlib by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3589
[xpu] add ernie4.5 yaml by @DongBaiYue in https://github.com/PaddlePaddle/PaddleFormers/pull/3588
[metax_gpu] use method for ernie in metax-gpu. by @xuanyuanminzheng in https://github.com/PaddlePaddle/PaddleFormers/pull/3576
[Add]lora fsdp yaml by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3596
修复因兼容generate传入position_ids维度不同引入的训练错误 by @qhpeklh5959 in https://github.com/PaddlePaddle/PaddleFormers/pull/3583
add fp8_linear in ce deepseekv3 by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/3559
Add config.yaml and bash for PaddleOCR-VL by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3597
fix fleet sft bug. by @wtmlon in https://github.com/PaddlePaddle/PaddleFormers/pull/3604
[Benchmark]Fix yaml by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/3607
update dev version 1.0.0 by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3612
Feat(processor): Update video processor accuracy alignment baseline in unittest by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3608
update deafult value for xxx_strategy and model_conf by @DrownFish19 in https://github.com/PaddlePaddle/PaddleFormers/pull/3609
[dev] fix argparse.ArgumentError by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3614
Fix deepseek exe bug by cli by @zhangbo9674 in https://github.com/PaddlePaddle/PaddleFormers/pull/3623
Fix(fc): Fix AOA config error and update FC loading strategy in CI by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3595
fix profile duplicate nvtx by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/3622
fix eb45 pretrain init.py by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3625
Add dsv3 pretrain by @zhangbo9674 in https://github.com/PaddlePaddle/PaddleFormers/pull/3628
update dpo yaml and add dpo_and_its_derivatives_zh.md by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/3633
fix uv check by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3632
Use Paddle rms_norm and swiglu api by @zhangbo9674 in https://github.com/PaddlePaddle/PaddleFormers/pull/3634
Add docs for PaddleFormers v1.0 by @nepeplwu in https://github.com/PaddlePaddle/PaddleFormers/pull/3639
[Iluvatar] Fix shell by @YqGe585 in https://github.com/PaddlePaddle/PaddleFormers/pull/3646
[Fleet]change fleet case to benchmark config by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3630
[CI] fix build whl & check release pr by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3610
Feat(processor): Support ImageProcessorFast for multi-model by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3641
Fix typo in Metax GPU installation document link by @lugimzzz in https://github.com/PaddlePaddle/PaddleFormers/pull/3651
[CI]pass test for only md test by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3661
fix check requirements by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3663
Remove lora redundant code by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3660
align qwen moe by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3587
add qwen3vl benchmark monitor config yamls by @qhpeklh5959 in https://github.com/PaddlePaddle/PaddleFormers/pull/3672
[XPU][CI] add xpu ci case by @plusNew001 in https://github.com/PaddlePaddle/PaddleFormers/pull/3631
Fix(CI): Resolve CI fallback execution failure by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3670
[XPU] update paddlepaddle-xpu-3.3.0.dev20260122 by @DongBaiYue in https://github.com/PaddlePaddle/PaddleFormers/pull/3683
update precision method by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3673
[qwen3] Fix attention precision mismatch align eager_attention_forward with hf by @zhanghonggeng in https://github.com/PaddlePaddle/PaddleFormers/pull/3677
Support QLoRA by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3685
【FlexCheckpoint】save/load support parallel_broadcast by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3642
Test(requirement): Testing limit transformers version by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3703
change name by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3686
Fix attention precision mismatch align eager_attention_forward with hf by @zhanghonggeng in https://github.com/PaddlePaddle/PaddleFormers/pull/3708
[qwen3]Set default device cpu in default_rope_parameters by @zhanghonggeng in https://github.com/PaddlePaddle/PaddleFormers/pull/3692
Fix build paddleformers whl by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3690
【FlexCheckpoint】Fix ignore load bug by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3688
[dev] paddle fix by @miao200years in https://github.com/PaddlePaddle/PaddleFormers/pull/3698
support merge with qdq base model by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3712
Fix(ci): Refactor GPU initialization to use decorator by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3694
Feat(processor): Support multi-batch size in view operations for Processor by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3718
Update README and add FAQs by @nepeplwu in https://github.com/PaddlePaddle/PaddleFormers/pull/3701
add fp8 callback by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3722
[glm4_moe & deepseek_v3]fd_fallback by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3728
【fleet】support Dpo lora by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3617
fix precision approve by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3733
fix cli distributed launch without rank info by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3714
feat: support external templates and plugins with custom args by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/3730
[dataflow] add dataset sample strategy by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3700
feat: support additional special tokens by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/3739
update fleet paddle by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3738
Add moe_use_pfcc_deepep args for paddlefleet by @ooooo-create in https://github.com/PaddlePaddle/PaddleFormers/pull/3725
[iluvatar] Add Iluvatar CI by @YqGe585 in https://github.com/PaddlePaddle/PaddleFormers/pull/3705
fix_ema_assembler_scaling_bug by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3716
fix aoa conifg for fp8 && add paddlefleet commit in logfile by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3744
[Precison Change] No padding when cp*sp == 1 by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/3751
update fleet requirement by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3747
fix: new_special_tokens to path & extend it to DPO by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/3755
fleet args update addition. by @Feiye0979 in https://github.com/PaddlePaddle/PaddleFormers/pull/3544
【fleet】remove Qwen3vl aoa with fleet model change by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/3749
[Feat]Modifying the expert part of the Qwen3VLMoe by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3753
[XPU] add ERNIE-4.5-VL-28B-A3B-Thinking SFT by @DongBaiYue in https://github.com/PaddlePaddle/PaddleFormers/pull/3754
fix fallback. initialize GPU by decorator by @Feiye0979 in https://github.com/PaddlePaddle/PaddleFormers/pull/3766
fix fallback. initialize GPU by decorator by @Feiye0979 in https://github.com/PaddlePaddle/PaddleFormers/pull/3771
Support Ernie VL eval by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3770
update ci precision for fused_rms_norm and router_aux_loss_coef by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/3763
Fix qwen3 vl moe bug by @risemeup1 in https://github.com/PaddlePaddle/PaddleFormers/pull/3765
Change default value of router_aux_loss_coef. by @Feiye0979 in https://github.com/PaddlePaddle/PaddleFormers/pull/3775
MTP General Fix by @WYB27 in https://github.com/PaddlePaddle/PaddleFormers/pull/3756
Fix(CI): Fix ffmpeg install by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3780
update verison 1.1.0 on develop by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3778
Remove aux_loss_coef from config and modeling by @WYB27 in https://github.com/PaddlePaddle/PaddleFormers/pull/3781
[dataflow] add sft offline dataflow by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3750
Fix dpo dataflow by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/3743
xpu support zcc by @deepllz in https://github.com/PaddlePaddle/PaddleFormers/pull/3785
[Refactor] unify get_mm_inputs args and fix PaddleOCRVL plugin by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3768
update benchmark case yaml by @XieYunshen in https://github.com/PaddlePaddle/PaddleFormers/pull/3757
Add global_norm in logs by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3793
Remove more aux loss coef by @WYB27 in https://github.com/PaddlePaddle/PaddleFormers/pull/3788
[Bug Fix]Fix dpo eval for unfleet pp model by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3791
Feat(processor): Support transformers v5.0 by optimize processor by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3742
Fix dpo eval for fleet model by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3798
Support wint4/8 quantization & quantization for Fleet Model by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3783
Add skip ci & fix build whl by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3732
update benchmark yaml by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3797
[CI] fix skip bug by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3804
[Feat] support multi-modal pretraining (VL-PT) without packing by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3796
add fleet_available by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3807
change fleet version by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3773
Change default router_aux_loss_coef to 0.0 by @WYB27 in https://github.com/PaddlePaddle/PaddleFormers/pull/3808
fix ernie45_moe aoa gate transpose by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/3792
[CI] fix timeout by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3746
fix model dpo by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/3800
[Fix]Fix for flashmask+kvcache in generate by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3806
[BUG FIX]Fix ernie4.5 qlora by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3811
[CI] Change XPU&& ILUVATER CI Case Dir by @plusNew001 in https://github.com/PaddlePaddle/PaddleFormers/pull/3818
Set default device cpu in default_rope_parameters by @zhanghonggeng in https://github.com/PaddlePaddle/PaddleFormers/pull/3761
[CI][XPU] update ubuntu22.04 by @DongBaiYue in https://github.com/PaddlePaddle/PaddleFormers/pull/3815
Fix(readme): Update tokenizer decode in README by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3809
[XPU] set truncate_packing=false by @DongBaiYue in https://github.com/PaddlePaddle/PaddleFormers/pull/3824
Fix(tokenizer): Fix tokenizer saving for ERNIE45 by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3825
fix flash saved signal by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3812
[XPU][CI] add xpu ci case by @plusNew001 in https://github.com/PaddlePaddle/PaddleFormers/pull/3821
fix(hparams): auto-disable truncate_packing for VLMs instead of raising error by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3826
[BUG FIX]Fix GPTModel by @BossPi in https://github.com/PaddlePaddle/PaddleFormers/pull/3837
Fix(processor): Resolve image processor loading bug by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3841
[Fleet CI]add_qwen3vl by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3820
Fix(tokenizer): Resolve attribute check issue when saving vocab files by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3842
[XPU] support ERNIE-4.5-21B-A3B-Thinking SFT by @DongBaiYue in https://github.com/PaddlePaddle/PaddleFormers/pull/3843
fix mtp dataflow by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3829
support mtp by @FeixLiu in https://github.com/PaddlePaddle/PaddleFormers/pull/3819
Fix(requirement): Limit transformers version into v5.0.0 by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3851
[CI] kill residual processes by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3828
auto select fc comm method && force sync shared_weight by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3817
Remove NumPy dependency for position_ids generation（PaddleOCRVLForConditionalGeneration） by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3840
supoort_profile_data_load by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/3836
[CI] Reduce pytest-xdist worker count by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3858
add extra padding for fp8 in dataset by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3838
add save_hf_log and fix comm method select by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3861
Fix save when moe_sharding_degree > 1 by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3863
Optimize slicing in PaddleOCRVisionEmbeddings by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3857
[Fleet CI]change paddle to release by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3866
fix mtp aoa by @FeixLiu in https://github.com/PaddlePaddle/PaddleFormers/pull/3868
Fix offload fp8 master weights in callback by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/3865
Add retry for iluvatar CI by @YqGe585 in https://github.com/PaddlePaddle/PaddleFormers/pull/3870
fix: fa_version and support fa4 by @xxyux in https://github.com/PaddlePaddle/PaddleFormers/pull/3877
[Glm4vMoe]add glm4v_moe by @lijialin03 in https://github.com/PaddlePaddle/PaddleFormers/pull/3298
Remove NumPy dependency for position_ids generation (PaddleOCREncoder) by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3883
fix pretrain dataflow by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3882
chore: misc updates for PaddleOCRVL dy2st support by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3881
support inv aoa by @FeixLiu in https://github.com/PaddlePaddle/PaddleFormers/pull/3885
Fix cuda pip install quoting by @ooooo-create in https://github.com/PaddlePaddle/PaddleFormers/pull/3880
add processor_use_fast config by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3888
Feat(kimi-k25):Support tokenizer && processor for Kimi-K2.5 by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3856
Add new a100 by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3859
support dpo vl by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/3847
fix_vit by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/3894
flashattn compute merge to once by @sevenan2 in https://github.com/PaddlePaddle/PaddleFormers/pull/3886
move T pos by @FeixLiu in https://github.com/PaddlePaddle/PaddleFormers/pull/3902
Fix(kimi-k25): Remove pydantic dependency by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3905
[Fleet CI]add qwen3vl by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3896
Fix freeze_config by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/3899
Ci update pytest xdist by @Feiye0979 in https://github.com/PaddlePaddle/PaddleFormers/pull/3879
[fix(datasets)]: fix missing return in mm_plugin by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3914
[Glm4vMoe]add transpose_weight_keys for lora export by @lijialin03 in https://github.com/PaddlePaddle/PaddleFormers/pull/3900
add auto cp release by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3923
[fix]unittest ci hang by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/3921
[export] fit transpose_keys with aoa by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/3926
Delete moe use pfcc deepep by @risemeup1 in https://github.com/PaddlePaddle/PaddleFormers/pull/3932
[Opt] Optimize image embeddings insertion by replacing masked_scatter with boolean indexing by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3933
fix relase build whl by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3938
[Perf] Disable preserve_external_rng_state in recompute for better performance by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3936
dataflow reader bugfix by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3906
add flash mask check by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3943
[Perf] Refactor vision position embedding construction with preallocation and caching by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3945
update args by @cjw-d in https://github.com/PaddlePaddle/PaddleFormers/pull/3934
[Fleet CI Dev]add glm pt ep4 && change dataloader number by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3931
fix_dev_whl_name by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3947
[feat(save)]: explicit copy_custom_file_list param by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3891
[fleet ci dev]change version by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3955
[Perf] Fix F.interpolate output dtype under AMP for PaddleOCR-VL vision embeddings by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3951
support distill by @LiYuRio in https://github.com/PaddlePaddle/PaddleFormers/pull/3911
revert test_save_load by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3919
add_mtp_loss_log by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/3960
[Glm4vMoe]fix sequence_parallel by @lijialin03 in https://github.com/PaddlePaddle/PaddleFormers/pull/3954
[export]fix _gen_aoa_config bug for vl model by @lijialin03 in https://github.com/PaddlePaddle/PaddleFormers/pull/3964
add qwen3_5 vit by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/3949
[Fleet CI]fix vl moe fc by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3970
change fleet version by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/3979
fix distill bug with need_clear by @LiYuRio in https://github.com/PaddlePaddle/PaddleFormers/pull/3981
Fix(processor): Support transformers >=5.0.0 by optimize processor by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3848
Fix(training): Fallback to slow processor for VL models with num_workers > 0 by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3966
[CI] temporarily skip test_save_load for PaddleOCR-VL by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/3986
[Perf] Optimize image projector: Replace loop-based reshape with vectorized index-gather by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3985
Fix(API): Update tensor mul with string usage by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/3988
Add GLM SFT 128K yaml by @Wennie396 in https://github.com/PaddlePaddle/PaddleFormers/pull/3679
update benchmark_yaml by @WYB27 in https://github.com/PaddlePaddle/PaddleFormers/pull/3969
[Feat ]add loss fn parameter qwen3vl fleet by @w-yyh in https://github.com/PaddlePaddle/PaddleFormers/pull/3989
Add moe_correction_bias_lr by @WYB27 in https://github.com/PaddlePaddle/PaddleFormers/pull/3992
[Perf] Optimize data preprocess(get_rope_index & _postprocess_sequence & mm_collate_fn) by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3987
[Perf]: Optimize PaddleOCR-VL vision encoder preprocessing (3.3ms → 2.1ms) by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3991
add multi processing for sft/pt dataflow by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/3869
disable qwen3next/qwen3vl test_save_load by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4000
Feature(triton): Add Triton RoPE kernel for vision models by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/3995
support no-pp save huggingface by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/3999
Add glm_ocr by @ZouKexin-522 in https://github.com/PaddlePaddle/PaddleFormers/pull/3978
[Benchmark]add eb45 benchmark by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4002
Fix numpy2 bug by @risemeup1 in https://github.com/PaddlePaddle/PaddleFormers/pull/4004
fix_aoa_config by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/4007
[BENCHMARK]fix eb45 gbs by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4008
Fix(CI): Fix gpu_device_initializer usage in glm4vmoe && glmocr CI by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/4012
[CI] Simplify paddlepaddle-iluvatar installation in iluvatar CI by @YqGe585 in https://github.com/PaddlePaddle/PaddleFormers/pull/4003
[feat] offline data bin merger by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/4011
Add some parameters to config.json by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/4015
Fix protobuf 7 bug by @risemeup1 in https://github.com/PaddlePaddle/PaddleFormers/pull/4013
suppoort modeling_pp.py for no pp parallel strategy by @Hz188 in https://github.com/PaddlePaddle/PaddleFormers/pull/3928
add-qwen2/3-fleet by @Yang-Yi20 in https://github.com/PaddlePaddle/PaddleFormers/pull/3965
fix_yaml by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/4023
[BENCHMARK]fix benchmark path by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4025
update qwen vl config dataset path to absolute path by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4031
Fix numpy by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4030
fix gbs in estimate max steps by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4029
[CI] diable uv by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/4040
Update setup.py fleet by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/4036
fix training tensorboard by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/4019
add qwen2moe fleet & fix qwen2/3 by @Yang-Yi20 in https://github.com/PaddlePaddle/PaddleFormers/pull/4037
【dataflow】memory-efficient offline dataflow by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4009
save last step ckpt by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/4034
update sft offline dataset merge script by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4050
修复qwen3vl aoa by @qhpeklh5959 in https://github.com/PaddlePaddle/PaddleFormers/pull/4044
fix(generate): replace bit-wise NOT with subtraction for logic NOT by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/4033
fix_use_triton_in_paddle_download by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4060
【Kimik2】 add kimik2 model by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/4021
add global_save_step by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/4047
fix num_reserved_tokens_for_each_dialog in dataflow by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4057
[BENCHMARK]fix benchmark seq len by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4071
Fix DSV3 speed degradation after adapting to Paddle 3.3.0 by @lshpku in https://github.com/PaddlePaddle/PaddleFormers/pull/3844
[FIX]Fix kimi k25 tokenizer by @Linboyan-trc in https://github.com/PaddlePaddle/PaddleFormers/pull/4066
[docs]: add PaddleOCR-VL-1.5 best practice docs and configs by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/4076
Add fallback for DSV3 deep_gemm version by @lshpku in https://github.com/PaddlePaddle/PaddleFormers/pull/4078
[CI] del uv by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/4084
[BENCHMARK]add deepseek v3 by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4074
fix iter all examples in packing by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4083
[docs]: copy yml for PaddleOCR by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/4086
fix to run fleet by @sevenan2 in https://github.com/PaddlePaddle/PaddleFormers/pull/3976
Assert sd>1 when enable moe_grouped_gemm by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/4080
fix log global_save_step by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/4075
rename moe_correction_bias_lr by @WYB27 in https://github.com/PaddlePaddle/PaddleFormers/pull/4097
[feat] Align dataset flow for Qwen3-omni by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/3962
[Qwen3_VL] speed_up by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/4092
add qwen3 omni by @zxcd in https://github.com/PaddlePaddle/PaddleFormers/pull/3910
fix num epochs by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4099
fix dev build whl by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4105
[BENCHMARK]fix 21B by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4095
修复qwen3vl保存权重的问题 by @qhpeklh5959 in https://github.com/PaddlePaddle/PaddleFormers/pull/4103
[Fleet CI] fix log check bug by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4109
add iterator and map dataflow by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4094
add ce dpo-vl data by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/4081
fix bugs by @Yang-Yi20 in https://github.com/PaddlePaddle/PaddleFormers/pull/4108
Fix rope precision by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/4116
[Auto-Parallel] Sync config to Fleet and fix use_intermediate_api by @Xing-lil in https://github.com/PaddlePaddle/PaddleFormers/pull/4119
[Fleet CI]change ci type by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4117
Revert "[Perf] Disable preserve_external_rng_state in recompute for better performance" by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/4118
[fix] initial input_ids in streamer when generate by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/4059
[XPU][Fix] ernie_45_vl_28b_a3b_thinking_sft_32k: sync config with removed/renamed params in PR #4108 by @DongBaiYue in https://github.com/PaddlePaddle/PaddleFormers/pull/4130
[Doc]install: add stable index-url by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4135
Support more GPU arches for FA by @sneaxiy in https://github.com/PaddlePaddle/PaddleFormers/pull/4136
[XPU] Fix fleet initialization bug on XPU device by @ZhangX-21 in https://github.com/PaddlePaddle/PaddleFormers/pull/4129
deprecate offload_optim functionality and related code by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4125
sort safetensors index entry by file name by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4128
add mtp benchmark case to PaddleFormers by @wangyuwen1999 in https://github.com/PaddlePaddle/PaddleFormers/pull/4132
Fix(training): Support Fast Processor for VL models with num_workers>0 by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/4041
Fleet mtp upgrade dev by @wtmlon in https://github.com/PaddlePaddle/PaddleFormers/pull/4140
【Kimik2】 support moe_group_gemm aoa by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/4141
Fix(CI): Change QwenVL case for Fleet CI by @Ace-To-HYB in https://github.com/PaddlePaddle/PaddleFormers/pull/4144
Remove no delay_loss_scale branch by @Waynezee in https://github.com/PaddlePaddle/PaddleFormers/pull/4088
fix bug by @Yang-Yi20 in https://github.com/PaddlePaddle/PaddleFormers/pull/4147
remove multi gpus unittest by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/3953
Add use_triton_in_paddle check by @DrRyanHuang in https://github.com/PaddlePaddle/PaddleFormers/pull/4146
fix apply_rope_fusion in Qwen3VL by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4160
[CI]fix transformers version by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4164
[WIP] qwen vl sp by @FeixLiu in https://github.com/PaddlePaddle/PaddleFormers/pull/4127
[docs] update rope triton kernel usage and lora export in PaddleOCR-VL by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/4148
[Model(PaddleOCR-VL)] revert packing_position_embedding by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/4163
[BENCHMARK]deepseek vs change to 8 cards by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4166
change fleet 0.2.0 by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/4168
add save_hf_memory_growth_threshold arg by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4155
【kimik2】change aoa ,change rotary_interleave = True by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/4175
Fix qwen3moe pipelinelayer and loss_fn is None and redundant sharding in auto-parallel by @sevenan2 in https://github.com/PaddlePaddle/PaddleFormers/pull/4123
[Qwen3_VL] support list[Tensor] inputs and mrope position_ids by @Guo-Yilong in https://github.com/PaddlePaddle/PaddleFormers/pull/4107
update paddlefleet fot VIT by @Eddie-Wang1120 in https://github.com/PaddlePaddle/PaddleFormers/pull/4176
【Kimik2】 fix kimi aoa by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/4180
optimize qwen3_vl fleet vision positional embedding computation by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4161
fix bugs by @Yang-Yi20 in https://github.com/PaddlePaddle/PaddleFormers/pull/4188
Add Qwen3.5 PaddleFleet model by @pkuzyc in https://github.com/PaddlePaddle/PaddleFormers/pull/3472
fix import quantize by @Yang-Yi20 in https://github.com/PaddlePaddle/PaddleFormers/pull/4193
[docs]: update PaddleOCR-VL-1.5 config paths by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/4184
[Fleet Version]change to 0.2.0.post20260401+df6b68ff7cb by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4195
[Qwen3_VL] Optimize VisionModel: pre-compute attn_mask and pass total_seqlen by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4173
support distill with pp_need_data by @AlAuAu in https://github.com/PaddlePaddle/PaddleFormers/pull/4183
bug fix by @FeixLiu in https://github.com/PaddlePaddle/PaddleFormers/pull/4198
add FT_test glm45 yaml by @llbdyiu66 in https://github.com/PaddlePaddle/PaddleFormers/pull/4200
Fuse moe lora by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/4192
add doc. (#4182) by @wtmlon in https://github.com/PaddlePaddle/PaddleFormers/pull/4204
support_aoa by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/4209
[Fleet]change to 0.2.0.post20260402+b5e37645280 by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4213
[feat(sampler)]: support consumed data skip for mapping sampler by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/4206
fix qwen2_5_vl lora merge by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/4212
add data truncation strategy by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4203
fix_qwen35_aoa by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/4215
Support GLM-5 by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/3940
Qwen35 raise error wehn tp > 1 by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/4219
optimize for trainer_state save time by @Hz188 in https://github.com/PaddlePaddle/PaddleFormers/pull/4220
[Fleet]change fleet version to 0.2.0.post20260407+8431b910ac9 by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4229
[fix(sampler)] revert NlpDistributedBatchSampler branch by @forBlank in https://github.com/PaddlePaddle/PaddleFormers/pull/4227
Qwen35 aoa support tp greater 1 by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/4223
[PaddleFleet] change to 0.2.0.post20260408+823309dec9b by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4233
fix CI test_save_load by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/4224
update train params for FT by @hysunflower in https://github.com/PaddlePaddle/PaddleFormers/pull/4222
drop flags for FT by @hysunflower in https://github.com/PaddlePaddle/PaddleFormers/pull/4238
Add minimax by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/4221
support ernie code update by @AlAuAu in https://github.com/PaddlePaddle/PaddleFormers/pull/4232
fix transformers download by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4248
add moe_deepgemm by @zhangbo9674 in https://github.com/PaddlePaddle/PaddleFormers/pull/4242
[Fleet CI]change to python 3.12 by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4255
[Fleet]change to 0.2.0.post20260410+ca3866caa49 by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4260
[feat][perf][refactor] MapDataset packing by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/4159
[perf] skip_warmup & warmup on rank0 in one machine only by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/4237
[Compatibility] replace torch proxy compat aliases with public APIs by @ShigureNyako in https://github.com/PaddlePaddle/PaddleFormers/pull/4265
fix _unsavable_keys by @Lcysabcu in https://github.com/PaddlePaddle/PaddleFormers/pull/4261
add new model ci/ce by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4145
fix_new_models_without_case by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4270
remove useless config in eb45 pretrain yaml by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4272
[Qwen3_VL] Enable bias_activation_fusion and apply_rope_fusion by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4186
[refactor] sharding strategy by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/4271
upload glm45 sft cp ci case by @wangyuwen1999 in https://github.com/PaddlePaddle/PaddleFormers/pull/4254
[Trainer] feat(callback): support Fleet MoE class and GlobalRNGCallback by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/4273
add dataflow test by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4236
【FT】Fix Fault Tolerance in PaddleFleet by @Xing-lil in https://github.com/PaddlePaddle/PaddleFormers/pull/4264
add ce task by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4245
[CE] fix defualt python by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4288
fix ce task by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4289
support ernie dense model with ema by @AlAuAu in https://github.com/PaddlePaddle/PaddleFormers/pull/4287
Benchmark回归case适配 by @XieYunshen in https://github.com/PaddlePaddle/PaddleFormers/pull/4244
[Fleet CI]add more single card concurrency by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4281
[test] add MapSFTDataset test & fix mapacking processor_func by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/4280
[CE]fix upload by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4291
fix build whl by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4292
[refactor] replace decord by paddlecodec by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/4278
[Fleet CI]Precision Update Change by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4285
[ce]fix bce version by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4299
[Pipeline Parallel]: Replace PaddleFleet PP with Paddle PP by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/3930
20260414 add ai edited test by @liuhao2638 in https://github.com/PaddlePaddle/PaddleFormers/pull/4283
[XPU][CI]update base value by @plusNew001 in https://github.com/PaddlePaddle/PaddleFormers/pull/4303
[CI] add qwen3 vl by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4284
[Fleet CI] del apt update by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4306
test ci baseline by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4309
[Trainer] Fix callbacks when PaddleFleet can not use by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/4300
Set profile timers from PaddleFleet in Trainer init by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4315
[CI]fix cherry pick by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4317
【ZCC】Fix ema with unshard param by @Xing-lil in https://github.com/PaddlePaddle/PaddleFormers/pull/4316
[CI] add fleet model skip logic by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4321
fix_qwen3_vl_resume by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4318
Zcc adapter Muon by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4325
add stream dataset dist dataloader by @Jonathans575 in https://github.com/PaddlePaddle/PaddleFormers/pull/4154
【FD】add get_inputembeding, get_lm_head by @xiaoguoguo626807 in https://github.com/PaddlePaddle/PaddleFormers/pull/4330
Fix model_config summary logging for GPT provider by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4335
add mla aoa config by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4341
[Iluvatar] Update baseline by @tianyuzhou669 in https://github.com/PaddlePaddle/PaddleFormers/pull/4336
add freeze_training flag by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4340
[XPU][CI] Update XPU CI baseline by @plusNew001 in https://github.com/PaddlePaddle/PaddleFormers/pull/4345
fix_Qwen3VLVisionProvider_fa_version_bug by @risemeup1 in https://github.com/PaddlePaddle/PaddleFormers/pull/4344
[Deps] pin paddlecodec to >=0.1, <0.2 for Paddle 3.3 compatibility by @SigureMo in https://github.com/PaddlePaddle/PaddleFormers/pull/4348
Test qwen3vl by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4346
Add high_precision_rope cfg by @zhangbo9674 in https://github.com/PaddlePaddle/PaddleFormers/pull/4329
disable kimi2_k2 unittest by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4356
update fleet version by @Wennie396 in https://github.com/PaddlePaddle/PaddleFormers/pull/4342
fix ema state assembler by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4350
add muon by @xxyux in https://github.com/PaddlePaddle/PaddleFormers/pull/4231
separate mtp head & loss for pp balance by @Wennie396 in https://github.com/PaddlePaddle/PaddleFormers/pull/4327
fix minimax v2 aoa config by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4357
update unittest ce by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4351
add mtp attn mask and layer mask by @weiyixuanxx in https://github.com/PaddlePaddle/PaddleFormers/pull/4343
Muon update attention_heads by @xxyux in https://github.com/PaddlePaddle/PaddleFormers/pull/4368
[FA] support flashmask_use_varlen by @umiswing in https://github.com/PaddlePaddle/PaddleFormers/pull/4366
zcc muon support register buffer by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4369
[fix][ci] glm4_moe template align & MapSFTdataset ci by @wacxr123 in https://github.com/PaddlePaddle/PaddleFormers/pull/4243
[CI]update unittest by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4362
add aoa config for minimax v25 by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4374
add muon slice config for gated_attn by @xxyux in https://github.com/PaddlePaddle/PaddleFormers/pull/4375
add formers bot by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4381
[muon] fix xpu import muon by @umiswing in https://github.com/PaddlePaddle/PaddleFormers/pull/4377
fix minimax aoa config by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4379
fix build_whl_docker by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4382
update muon slice func by @xxyux in https://github.com/PaddlePaddle/PaddleFormers/pull/4378
Test paddleformers bot by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4385
udpate init_optimizer by @xxyux in https://github.com/PaddlePaddle/PaddleFormers/pull/4367
Refactor zero_cost_checkpoint to use color-group-based 2D param colle… by @xxyux in https://github.com/PaddlePaddle/PaddleFormers/pull/4380
update fleet by @swgu98 in https://github.com/PaddlePaddle/PaddleFormers/pull/4390
Fix ZCC resume with moe_router_bias_update_rate > 0 by @Xing-lil in https://github.com/PaddlePaddle/PaddleFormers/pull/4391
[FinetuningArgs] Support compute_type=float32 for full precision training by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/4389
ema state assembler adpter group gemm by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4393
fix muon adapter zcc by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4394
fix fc v2 by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4395
fix zcc zero shape case by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4396
Fix up and gate fused axis error by @sevenan2 in https://github.com/PaddlePaddle/PaddleFormers/pull/4399
[CI]update runs-on by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4370
[Fleet]change fleet to 0506 by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4401
[ci] fix build whl runs on by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4402
Revert "[ci] fix build whl runs on" by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4403
[CI] add log auto analysis by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4398
fix formers bot multi task by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4388
hotfix by @xuxinyi389 in https://github.com/PaddlePaddle/PaddleFormers/pull/4409
[XPU] add xpu upload whl action by @plusNew001 in https://github.com/PaddlePaddle/PaddleFormers/pull/4414
[CI] add scheduled trigger by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4412
[XPU] fix bugs for xpu upload whl by @plusNew001 in https://github.com/PaddlePaddle/PaddleFormers/pull/4417
update_ci_yaml by @wangyuwen1999 in https://github.com/PaddlePaddle/PaddleFormers/pull/4422
add muon_ns_coeffs by @xxyux in https://github.com/PaddlePaddle/PaddleFormers/pull/4413
fix glm args trans by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4411
Fix reshape for zcc master weight save by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/4419
Add empty skills directory by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/4415
add allure for ci/ce by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4338
fix separate_mtp_headloss aoa for hf save & load by @Wennie396 in https://github.com/PaddlePaddle/PaddleFormers/pull/4420
Update qwen_single_card.json by @a31413510 in https://github.com/PaddlePaddle/PaddleFormers/pull/4425
迁移Benchmark的case至场内代码库 by @XieYunshen in https://github.com/PaddlePaddle/PaddleFormers/pull/4418
[CI] upload log analysis to monitor by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4427
[XPU][CI] Update CI baseline by @plusNew001 in https://github.com/PaddlePaddle/PaddleFormers/pull/4431
[ci] fix upload to monitor by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4432
[CE]update allure pages url by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4428
[CE]fix bugs by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4437
test upload log to monitor by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4434
fix qwen35 bug by @FeixLiu in https://github.com/PaddlePaddle/PaddleFormers/pull/4424
refactor(fp8): merge clear_origin_weight_when_offline_quant into offline_quant_expert_weight by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4426
fix o_proj weight double transpose by @xingmingyyj in https://github.com/PaddlePaddle/PaddleFormers/pull/4440
[CE]Fix allure bug by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4446
[CE]fix allure by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4450
[CI] fix fleet ci log analysis by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4455
[CI] fix permission auth by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4456
[codex] Default disable_tqdm to true by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4442
Drop dead benchmark switch from trainer configs by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4448
[CI] test formers bot by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4457
[CI] fix checks steps by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4465
minmiax_aoa_gated_attn by @FeixLiu in https://github.com/PaddlePaddle/PaddleFormers/pull/4462
support hf ckpt for qwen35 by @FeixLiu in https://github.com/PaddlePaddle/PaddleFormers/pull/4452
add ai edited test by @liuhao2638 in https://github.com/PaddlePaddle/PaddleFormers/pull/4461
[CI]remove proxy for model_unittest by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4454
[CE]fix model_unittest ce by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4468
[CI] fix iluvator check steps by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4464
Fix v2 shared_experts aoa by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/4470
[CI] fix schedule regression by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4469
[CI] Test post comment by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4475
[CI] fix qwen3 vl by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4476
Remove timer training argument by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4458
[CE]fix paddlefleet_ops install by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4478
[CE]fix paddlefleet_ops install latest by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4479
Fix minimax aoa by @changeyoung98 in https://github.com/PaddlePaddle/PaddleFormers/pull/4474
fix fp8_quant_weight by @Difers in https://github.com/PaddlePaddle/PaddleFormers/pull/4429
[CE]Fix paddlefleet_ops by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4480
[CI] test formers bot post commont by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4477
Revert "[CI] test formers bot post commont" by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4483
update muon for GLM5 by @xxyux in https://github.com/PaddlePaddle/PaddleFormers/pull/4444
[codex] Fix interval token throughput logging by @huangjiyi in https://github.com/PaddlePaddle/PaddleFormers/pull/4486
[CI]add paddlefleet_ops install by @Liujie0926 in https://github.com/PaddlePaddle/PaddleFormers/pull/4489
[Fleet CI]change fleet by @tianlef in https://github.com/PaddlePaddle/PaddleFormers/pull/4488
[Trainer] Raise Error for Configuration Conflict in Trainer by @hushenwei2000 in https://github.com/PaddlePaddle/PaddleFormers/pull/4430
[CI] fix paddle CI bot post commont by @zjjlivein in https://github.com/PaddlePaddle/PaddleFormers/pull/4492
update fleet_ops by @liuruyan in https://github.com/PaddlePaddle/PaddleFormers/pull/4493
support muon broadcast and reduce opt by @AlAuAu in https://github.com/Padd

Contributors

qhpeklh5959, zxcd, and 99 other contributors

Assets 2