Merge gpu graph to develop #59000

danleifeng · 2023-11-14T11:49:20Z

PR types

New features

PR changes

Others

Description

Merge gpu graph to develop
更新点：

PS模式下实现分布式混合并行，进行模型训练加速与显存优化，满足百亿ErnieSage图模型需求。

amp
sharding
recompute

完善多机多卡采样及训练，满足百亿节点千亿边规模的图模型需求。

子图切割
多机游走
跨机采样
训练全流程适配

图训练新增功能6+

连续值特征
边特征
多pair对
增量建图
用文件指定train/infer节点
带权采样

Pcard-77633

…lePaddle#58675)

…addlePaddle#58699)

* [XPU] add bfloat16 support for gaussian and uniform * fix zero dim.

…lePaddle#58296) * add unsqueeze spmd rules * fix bugs * fix bugs * modify the code based on the first review * fix bugs

…pir (PaddlePaddle#58384)

* fix * fix * fix * fix * fix * fix * fux * fix * add ut

* change cinn fp16 matmul cbulas api ti gemmex * fix flag error * remove flag * fix flags * fix test * fix test * fix fp cublas gemmbatchedstrideex

…58629)

* align dy(_dygraph_clip) and auto(_static_clip) * fix stack&reduce_sum caused ci unittest fails * import _g_gradient_clip_ops in reshard * import _g_gradient_clip_ops in rule_based_tuner * add op.type == stack/reduce_sum in pipeline.py * in static mode, async_add_n is only used in auto_parallel mode * fix test_gradient_clip.py unittest * remote cast op after stack&reduce_sum ops being removed

…55658) * enable clang-analyzer-unix.Malloc rule in clang-tidy * fix 2 Malloc clang-tidy * add comment

* allow pir::Program dynamically add attribute * add seed for pir::Program * polish code

PaddlePaddle#58285) * fix test * fix test * revert context_pool * fix * CI

* add s2r in crossmesh * add s_to_r reshard * add s_to_r reshard * add s_to_r reshard * add s_to_r reshard

* change cc_test_old to cc_test * change cc_test_old to cc_test * fix pre * chang cc_test_old to cc_test

* add log * add getkerneltype func by yaml * delete VLOG * update * change kernelkey to datatype * update * move util functions into pd_op_lower_to_kernel_pass

…ss and fix constant_folding_pass (PaddlePaddle#58732) * fix dead_code_elimination_pass and delete reorder_block_ops_pass * update * update * update * update * update

… into pir (PaddlePaddle#58693) * add common.py * rm default_main_program && create new func * add static.program_guard * rm dynamic_and_pir_mode_test func

* tensor.cc support auto parallel

* ✨ Refactor: enable new ir op and added new ir test * Update python/paddle/tensor/math.py Co-authored-by: Lu Qi <61354321+MarioLulab@users.noreply.github.com> * ♻️ Refactor: updated test * 🎨 Fix: updated code style --------- Co-authored-by: Lu Qi <61354321+MarioLulab@users.noreply.github.com>

…dle.geometric.segment_mean, paddle.geometric.segment_min, paddle.geometric.segment_sum into pir (PaddlePaddle#58579)

…Paddle#58625) * fix * fix * trigger CI * fix

--------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>

…le#58645) * fix * refine * refine * fix * fix * fix * fix

… merge_gpugraph

zhangbo9674

lgtm

lanxianghit

LGTM for flag

jeff41404 · 2023-12-08T06:54:11Z

python/paddle/incubate/operators/unzip.py

 from paddle.base.layer_helper import LayerHelper


-def unzip(input, lod):
+def unzip(input, lod, len):


The meaning of new parameter len should be explained in the Args below

will fix in the next pr

ZzSean

LGTM for CI-OP-Benchmark

jzhang533 · 2023-12-08T07:35:49Z

its impossible to review such a huge PR in a limited timeframe.
I'd like to suggest skip PR-CI-Static-Check manually for this PR.

Should we consider using some modernized toolings for the challenges in the codebase cased by huge PRs?

split a huge PR into moderate size PRs using ghstack or similar tools.
consider using CODEOWNERS and some merge rules to route a PR to more suitable person in charge for review.

cc: @XiaoguangHu01 @jeff41404 @phlrain @JiabinYang @sneaxiy

DrRyanHuang and others added 30 commits November 6, 2023 14:38

【PIR api adaptor No.233、234】 Migrate paddle.trunc/frac into pir (Padd…

25a60ed

…lePaddle#58675)

add vsplit_unstack (PaddlePaddle#58683)

d5cfdb9

【PIR api adaptor No.213、214】 Migrate CosineSimilarity/stanh into pir (P…

d684360

…addlePaddle#58699)

[Docathon][Fix System Message No.7、11、25、28] (PaddlePaddle#58508)

19756c8

[XPU] add bfloat16 support for gaussian and uniform (PaddlePaddle#58662)

ffaabc0

* [XPU] add bfloat16 support for gaussian and uniform * fix zero dim.

【Hackathon 5th No.52】为 Paddle 新增 unsqueeze 的 spmd 切分推导规则 -part (Padd…

213cad8

…lePaddle#58296) * add unsqueeze spmd rules * fix bugs * fix bugs * modify the code based on the first review * fix bugs

【PIR API adaptor No.53、111、112、116、208、229、230】Migrate some ops into …

c5c3246

…pir (PaddlePaddle#58384)

[PIR] Support select complex kernel for real_grad (PaddlePaddle#58665)

14e0c22

* fix * fix * fix * fix * fix * fix * fux * fix * add ut

[CINN]revise cinn fp16 matmul cbulas api to gemmex (PaddlePaddle#56845)

c1f7cf1

* change cinn fp16 matmul cbulas api ti gemmex * fix flag error * remove flag * fix flags * fix test * fix test * fix fp cublas gemmbatchedstrideex

【PIR api adaptor No.48、49】 Migrate cummax/min into pir (PaddlePaddle#…

f7147e8

…58629)

[PIR]Migrate interpolate into pir (PaddlePaddle#58531)

5423774

[clang-tidy] No.48 enable clang-analyzer-unix.Malloc (PaddlePaddle#…

2f82110

…55658) * enable clang-analyzer-unix.Malloc rule in clang-tidy * fix 2 Malloc clang-tidy * add comment

[PIR] Allow pir program dynamically add attr (PaddlePaddle#58660)

bf162e6

* allow pir::Program dynamically add attribute * add seed for pir::Program * polish code

[clang-tidy] NO.15 enable cppcoreguidelines-pro-type-const-cast Part.1 (

6725e56

PaddlePaddle#58285) * fix test * fix test * revert context_pool * fix * CI

[Reshard] Support s to r on cross mesh (PaddlePaddle#58592)

d6e08a1

* add s2r in crossmesh * add s_to_r reshard * add s_to_r reshard * add s_to_r reshard * add s_to_r reshard

add test (PaddlePaddle#58670)

569207d

refine nccl op list for pir interpreter (PaddlePaddle#58651)

4070d90

Change cc_test_old to cc_test (PaddlePaddle#58710)

fc87d51

* change cc_test_old to cc_test * change cc_test_old to cc_test * fix pre * chang cc_test_old to cc_test

[PIR] Add Dtype Transfer for pir (PaddlePaddle#58397)

74d9cf7

* add log * add getkerneltype func by yaml * delete VLOG * update * change kernelkey to datatype * update * move util functions into pd_op_lower_to_kernel_pass

[PIR] Rewrite dead_code_elimination_pass, delete reorder_block_ops_pa…

87ec82b

…ss and fix constant_folding_pass (PaddlePaddle#58732) * fix dead_code_elimination_pass and delete reorder_block_ops_pass * update * update * update * update * update

【PIR API adaptor No.39、123】Migrate label_smooth & class_center_sample…

9484e10

… into pir (PaddlePaddle#58693) * add common.py * rm default_main_program && create new func * add static.program_guard * rm dynamic_and_pir_mode_test func

[AutoParallel] tensor.cc support auto parallel (PaddlePaddle#58656)

c01b4a3

* tensor.cc support auto parallel

【PIR API adaptor No.193-196】Migrate paddle.geometric.segment_max, pad…

41ae2f8

…dle.geometric.segment_mean, paddle.geometric.segment_min, paddle.geometric.segment_sum into pir (PaddlePaddle#58579)

[PIR] run llama model in pir mode. (PaddlePaddle#58167)

d034906

[CodeStyle][ruff] clean some F401 step: 3 (PaddlePaddle#58306)

1aa681c

[PIR] separate translation context for body block of while op (Paddle…

47b4148

…Paddle#58625) * fix * fix * trigger CI * fix

[Dy2St] pir dy2st unittest verification - Part 2 (PaddlePaddle#58686)

16d471b

--------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>

[NewExe] Performance optimization for program interpreter (PaddlePadd…

2168263

…le#58645) * fix * refine * refine * fix * fix * fix * fix

danleifeng added 5 commits December 6, 2023 03:21

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

aa8ce8b

… merge_gpugraph

fix unittest error and ctr2 debug; test=develop

b40c882

fix unittest error and ctr2 debug; test=develop

d1763e4

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

48926aa

… merge_gpugraph

fix unittest error and ctr2 debug; test=develop

d85d762

PaddlePaddle locked and limited conversation to collaborators Dec 7, 2023

PaddlePaddle unlocked this conversation Dec 7, 2023

danleifeng added 6 commits December 7, 2023 12:27

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2569f42

… merge_gpugraph

fix unittest error and rm useless code; test=develop

ad999c8

fix unittest error and rm useless code; test=develop

023b8e9

fix unittest error and rm useless code; test=develop

633bcf6

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5a190ea

… merge_gpugraph

fix unittest error and rm useless code; test=develop

d10d7a3

sneaxiy approved these changes Dec 8, 2023

View reviewed changes

chenwhql approved these changes Dec 8, 2023

View reviewed changes

danleifeng requested a review from jzhang533 December 8, 2023 04:25

zhangbo9674 approved these changes Dec 8, 2023

View reviewed changes

lanxianghit approved these changes Dec 8, 2023

View reviewed changes

jeff41404 reviewed Dec 8, 2023

View reviewed changes

ZzSean approved these changes Dec 8, 2023

View reviewed changes

raindrops2sea approved these changes Dec 8, 2023

View reviewed changes

danleifeng merged commit d3ff254 into PaddlePaddle:develop Dec 8, 2023
27 of 29 checks passed

This was referenced Dec 11, 2023

fix cuda117 compile error and rm useless code for gpu graph #59892

Merged

fix set_constant error #59905

Merged

[cherrp-pick]fix cuda117 compile error and rm useless code #59915

Merged

This was referenced Jan 16, 2024

fix unique kernel variable name #60839

Closed

Fix unique #60840

Merged

danleifeng mentioned this pull request Jan 31, 2024

fix cpups training bug:executor trainer use_ps_gpu value #61406

Merged

danleifeng mentioned this pull request Feb 27, 2024

[cherrypick]fix cpups training bug:executor trainer use_ps_gpu value #62111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge gpu graph to develop #59000

Merge gpu graph to develop #59000

danleifeng commented Nov 14, 2023 •

edited

Loading

zhangbo9674 left a comment

lanxianghit left a comment

jeff41404 Dec 8, 2023

danleifeng Dec 8, 2023

ZzSean left a comment

jzhang533 commented Dec 8, 2023 •

edited

Loading

Merge gpu graph to develop #59000

Merge gpu graph to develop #59000

Conversation

danleifeng commented Nov 14, 2023 • edited Loading

PR types

PR changes

Description

zhangbo9674 left a comment

Choose a reason for hiding this comment

lanxianghit left a comment

Choose a reason for hiding this comment

jeff41404 Dec 8, 2023

Choose a reason for hiding this comment

danleifeng Dec 8, 2023

Choose a reason for hiding this comment

ZzSean left a comment

Choose a reason for hiding this comment

jzhang533 commented Dec 8, 2023 • edited Loading

danleifeng commented Nov 14, 2023 •

edited

Loading

jzhang533 commented Dec 8, 2023 •

edited

Loading