Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge gpu graph to develop #59000

Merged
merged 2,489 commits into from
Dec 8, 2023
Merged

Conversation

danleifeng
Copy link
Contributor

@danleifeng danleifeng commented Nov 14, 2023

PR types

New features

PR changes

Others

Description

Merge gpu graph to develop
更新点:

  1. PS模式下实现分布式混合并行,进行模型训练加速与显存优化,满足百亿ErnieSage图模型需求。
  • amp
  • sharding
  • recompute
  1. 完善多机多卡采样及训练,满足百亿节点千亿边规模的图模型需求。
  • 子图切割
  • 多机游走
  • 跨机采样
  • 训练全流程适配
  1. 图训练新增功能6+
  • 连续值特征
  • 边特征
  • 多pair对
  • 增量建图
  • 用文件指定train/infer节点
  • 带权采样

Pcard-77633

DrRyanHuang and others added 30 commits November 6, 2023 14:38
* [XPU] add bfloat16 support for gaussian and uniform

* fix zero dim.
…lePaddle#58296)

* add unsqueeze spmd rules

* fix bugs

* fix bugs

* modify the code based on the first review

* fix bugs
* fix

* fix

* fix

* fix

* fix

* fix

* fux

* fix

* add ut
* change cinn fp16 matmul cbulas api ti gemmex

* fix flag error

* remove flag

* fix flags

* fix test

* fix test

* fix fp cublas gemmbatchedstrideex
* align dy(_dygraph_clip) and auto(_static_clip)

* fix stack&reduce_sum caused ci unittest fails

* import  _g_gradient_clip_ops in reshard

* import _g_gradient_clip_ops in rule_based_tuner

* add op.type == stack/reduce_sum in pipeline.py

* in static mode, async_add_n is only used in auto_parallel mode

* fix test_gradient_clip.py unittest

* remote cast op after stack&reduce_sum ops being removed
…55658)

* enable clang-analyzer-unix.Malloc rule in clang-tidy

* fix 2 Malloc clang-tidy

* add comment
* allow pir::Program dynamically add attribute

* add seed for pir::Program

* polish code
* add s2r in crossmesh

* add s_to_r reshard

* add s_to_r reshard

* add s_to_r reshard

* add s_to_r reshard
* change cc_test_old to cc_test

* change cc_test_old to cc_test

* fix pre

* chang cc_test_old to cc_test
* add log

* add getkerneltype func by yaml

* delete VLOG

* update

* change kernelkey to datatype

* update

* move util functions into pd_op_lower_to_kernel_pass
…ss and fix constant_folding_pass (PaddlePaddle#58732)

* fix dead_code_elimination_pass and delete reorder_block_ops_pass

* update

* update

* update

* update

* update
… into pir (PaddlePaddle#58693)

* add common.py

* rm default_main_program && create new func

* add static.program_guard

* rm dynamic_and_pir_mode_test func
* ✨ Refactor: enable new ir op and added new ir test

* Update python/paddle/tensor/math.py

Co-authored-by: Lu Qi <61354321+MarioLulab@users.noreply.github.com>

* ♻️ Refactor: updated test

* 🎨 Fix: updated code style

---------

Co-authored-by: Lu Qi <61354321+MarioLulab@users.noreply.github.com>
…dle.geometric.segment_mean, paddle.geometric.segment_min, paddle.geometric.segment_sum into pir (PaddlePaddle#58579)
---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Dec 7, 2023
@PaddlePaddle PaddlePaddle unlocked this conversation Dec 7, 2023
Copy link
Contributor

@zhangbo9674 zhangbo9674 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@lanxianghit lanxianghit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for flag

from paddle.base.layer_helper import LayerHelper


def unzip(input, lod):
def unzip(input, lod, len):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning of new parameter len should be explained in the Args below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix in the next pr

Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for CI-OP-Benchmark

@jzhang533
Copy link
Contributor

jzhang533 commented Dec 8, 2023

its impossible to review such a huge PR in a limited timeframe.
I'd like to suggest skip PR-CI-Static-Check manually for this PR.

Should we consider using some modernized toolings for the challenges in the codebase cased by huge PRs?

  • split a huge PR into moderate size PRs using ghstack or similar tools.
  • consider using CODEOWNERS and some merge rules to route a PR to more suitable person in charge for review.

cc: @XiaoguangHu01 @jeff41404 @phlrain @JiabinYang @sneaxiy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.