【PIR API adaptor No.258、295、299、307】 Migrate glu/rank/sgn/take into pir #59535

DrRyanHuang · 2023-11-29T17:41:14Z

PR types

Others

PR changes

APIs

Description

新IR Python API适配升级 #58067
test_sgn 单测全部通过（已添加静态图pir适配）
test_glu 单测全部通过
test_take 单测全部通过
rank 暂时没有发现单测

test/legacy_test/test_zero_dim_tensor.py

python/paddle/tensor/attribute.py

python/paddle/tensor/math.py

Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com>

0x45f · 2023-12-05T07:42:37Z

这个pr有适配sgn这个API吗？

* [Inference] New executor support input hook and fix shape file collection in trt (#59466) * [Inference] new executor support input hook * update * update * ci(cinn): update cinn ci to support dynamic shape (#58996) * test=cinnunit * test=cinnunit * test=cinnunit * test=cinnunit * update seed in top_p_sampling (#59494) * refine pir interpreter nccl op check (#59515) * fix compile bug (#59487) * [auto parallel] add softmax backward spmd rule (#59039) * [auto parallel] add softmax backward spmd rule * update test to new eager parallel api * [PIR+CINN]Part-2 Pybind IrParser.ParseProgram and Polish UT into check_run (#59449) * [PIR+CINN]Support SubGraph Exporter for Unittest Platform add unittest fix UT not take effect [PIR+CINN]Pybind IrParser.ParseProgram and Polish UT into check_run fix cmake flasgs remove VLOG fix code comment * fix conflict * remove print * fix UT * add list.sort to fix random * [Docathon][Fix System Message No.3、9、14、15] (#58664) * [PIR] support pd_op.expand convert to cinn_opbroadcast_to (#59437) * pir cinn support multi group * update * update * fix pir cinn pow op bug * remove useless code * update * update * [api.cc] Fix kernel_backend to actual_kernel_backend to enable CPU-fallback (#59499) * [AutoParallel] Support view mechanism in auto parallel dygraph mode. (#59401) * [AutoParallel] Support view mechanism in auto parallel dygraph mode. * Polish code. * Trans dist_tensor to contiguous. * Add reshape backward code gen. * Polish reshape implementation. * Add yaml. * Polish code. * Fix reshape backward problems and add testcase. * Fix some problems. * Fix testcase. * [PIR / Dy2static] Fix mnist - part 1 (#59447) --------- Co-authored-by: chenzhiyang <1792266893@qq.com> Co-authored-by: SigureMo <sigure.qaq@gmail.com> * fit auto_parallel amp for llama (#59497) * [PIR] Support for If grad execution of ControlFlow ops (#59200) * support lower to kernel for if_grad op * fix bugs and warnings --------- Co-authored-by: zhangbo9674 <zhangbo54@baidu.com> * [XPU] update communication context (2) (#59482) * [XPU] update communication context (2) this is a follow up to #59418 * bugfix * typo * 【Program/Backward】fix order of static backward (#59304) * fix order of static backward * fix some error in topo order * remove useless breakpoint * fix * fix * fix * fix * fix * [XPU][PHI Kernels] support fused_rotary_position_embedding for xpu (#59480) * add solve op into TRT GenericPlugin (#59424) * 【PIR API adaptor No.174】 Migrate paddle.randint_like into pir (#58953) * [CI improve] remove useless output for some unittest (#59436) * Revert "[auto parallel] add softmax backward spmd rule (#59039)" (#59542) This reverts commit d86f686. * 【Hackathon 5th No.13】【关联 PR】Added uint8 support for sign kernel -part (#59514) * ✨ Feature: added uint8 support for sign * ♻️ Refactor: updated docs and type support * 🎨 Refactor: updaetd code style * Fix compiling error when setting WITH_MKL=OFF. (#59283) * [CINN] remove_fake_test_of_args_parse (#59504) * add check_grad && refine code (#59539) * 【pir】modify ir_backward to build If grad (#59520) * add if_grad_op * add if_grad_op * modify * [Semi-Auto] Support parallel cross entropy in static semi-auto training (#59187) * adapt cross_entropy_with_softmax rule to phi * support parallel cross_entropy in auto parallel * small fix * temporary save * add unit test for parallel_cross_entropy * resolve conflicts * small fix * Add random op to no check list (#59483) * add no check * add no check * Update pir_op_test_no_check_list * fix syncbn nan (#59089) * [Paddle-TRT] Enforce use new executor for trt engine memory sharing (#59495) * enforce use new executor for trt engine memory sharing * update * add ut * fix bug * [auto parallel]Open matmul auto parallel test in OpTest (#59503) * test framework supports to_static and prim * open check_auto_parallel in matmul op test --------- Co-authored-by: cyber-pioneer <chenzhuo@tju.edu.cn> * [AutoParallel] fix converter for 0-dim tensor (#59523) * 【Hackathon 5th No.37】为 Paddle 新增 householder_product API -part (#58214) * add householder_product api * fix codestyle * fix bug, detail describe for tests * codestyle * fix type error desc * codestyle, modify atol * assert when k > n, support complex, add more test units * codestyle * remove unused norm func * codestyle * modify api param:A to x * restore noqa * remove unused func * Update python/paddle/tensor/linalg.py Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> * Update python/paddle/tensor/linalg.py Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> --------- Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> * [AutoParallel] set comm dist_attr for dist_matmul (#59524) * [AutoParallel] rm infershape for dist_embedding (#59526) * [AutoParallel] rm infershape for dist_embedding * [AutoParallel] rm infershape for dist_embedding * Update dist_embedding.py * Add static graph support for "scaled_dot_product_attention" (#59498) * Added static graph support for 'scaled_dot_product_attention' * Add static graph support for "scaled_dot_product_attention" * [AutoParallel] Fix optimizer InferMeta. (#59246) * Fix optimizer infermeta. * Add testcase. * [XPU] Supports the different types of post dynamic quantization for conv and fc (#59307) * fix bug (#59516) * add linux compile requirements (#59443) * add linux compile requirements * update * update * polish code as PR 59200 review comments (#59549) * [PIR]Fix call InterpolateInferMeta in PIR (#59550) * [auto parallel] Shard optimizer API (#59342) * [auto parallel] add squeeze/unsqueeze backward spmd rules (#59547) * [PIR & Inference] Fix cf pass and mkldnn log (#59555) * instance norm passed (#59541) * 【Hackathon 5th No.14】Add combinations API to Paddle (#57792) * [PIR] Support while grad exe (#59496) * support lower to kernel for if_grad op * add PD_DECLARE_KERNEL * fix * fix * fix * resolve conflict * update * update * update * update * update * update * fix * update * update * update * update * update * update * update * update * fix bugs and warnings * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: chen2016013 <cx2016013@163.com> * [CINN] Translate pir::Tensor to ir::Tensor with Symbolic shape (#59196) * [CINN] Replace fake SymbolicDimOp * i[CINN] Translate pir::Tensor to ir::Tensor with Symbolic shape * fix * fix * fix compile * fix compile * fix * fix bug in static shape * runtime(cinn): update cinn jit instruction to support dynamic shape (#59470) * runtime(cinn): update cinn jit instruction to support dynamic shape * runtime(cinn): update cinn jit instruction to solve conflict * [PIR] add python api for while op (#59565) * [CINN] Move strong constraint branch unittest directory (#59501) * Move strong constraint branch unittest directory * Remove CINN_ONLY * Remove add_subdirectory * update variable_length_mem_eff_attn's unittest (#59568) * add comments (#59372) * add comments * fix bugs * fix bugs * add cuda place test and precision test for if_op_test (#59564) * Solve the problem of scale saving in PTQ (#59441) * [XPU] add some bf16 ops (#59505) * [PIR] Refine code for while_grad execute (#59566) * support lower to kernel for if_grad op * add PD_DECLARE_KERNEL * fix * fix * fix * resolve conflict * update * update * update * update * update * update * fix * update * update * update * update * update * update * update * update * fix bugs and warnings * fix * fix * fix * fix * fix * fix * fix * fix * fix * add debug --------- Co-authored-by: chen2016013 <cx2016013@163.com> * [oneDNN] optimize elementwise_add/sub for swin_transformer (#59421) * [PIR] adjust the member fucntion name in if_op (#59567) * fix multi encoder adaptive seqlen wrong (#59548) * sharding stage 1 check diff lr and use param decay fn (#59537) * [PIR] fix ci conflict, test=document_fix. (#59595) * [auto parallel]Add matmul auto parallel test (#59507) * test framework supports to_static and prim * add matmul auto parallel test in op test --------- Co-authored-by: cyber-pioneer <chenzhuo@tju.edu.cn> * Polish bfloat16 main_grad unittest for data parallel and sharding stage1. (#58842) * Polish bfloat16 main_grad unittest for data parallel. * Optimize unittest of sharding stage1. * Polish codes and add check of weights. * Polish unittest for sharding stage1. * Revert some minor changes. * Polish the compare of parameters. * Compute loss in float32. * [auto parallel] shard optimizer enhance (#59575) * 【PIR API adaptor No.36】check_numerics (#58879) * fix docs bugs (#59285) * fix docs bugs * modify as suggested * feat(new-ir): support nan_to_num (#59469) * [PIR]Remove refresh_stopgradient in backward (#59579) * remove refresh_stopgradient() * remove test * 【PIR/Dy2static】Fix pir test ---- PART II (#59532) --------- Co-authored-by: chenzhiyang <1792266893@qq.com> * [Dy2St] lower time > 100 in dy2st unittests (#59506) * 【Op Profiling】Add operator run time profiling feature (#58809) * [op] operator profiling * [op] operator profiling * fix ci * remove redundant code * code cleaning * minor fix * minor fix * minor fix * major code cleaning * minor fix * minor fix * minor fix * fix ci * minor fix * minor code style fixes * minor code style fixes * minor code style fixes * minor fix * fix ci * minor fix * minor fix * minor code style fix * minor code style fix * fix compile err * [auto parallel] add softmax backward spmd rule (#59545) * [auto parallel] add softmax backward spmd rule * update test to new eager parallel api * Revert "[auto parallel] add softmax backward spmd rule (#59039)" This reverts commit d86f686. * [auto parallel] add softmax backward spmd rule * update test to new eager parallel api * [Prim][PIR] full_like forward sink (#59534) * prim full_like sink * merge code * update full_like * remove code in rules.py * adjust softmax code * [OneDNN] Fix accuracy for matmul+binary_add fusion (#59527) * [Reshard] Support r to p on cross mesh (#59367) * wip: r2p reshard * wip: fix suitable * feat: r2p cross mesh * fix: strategy registry * fix: align with new api * [Reshard] Support p to r on cross mesh (#59621) * fix: typo * fix: typo * feat: reshard p2r * 【PIR API adaptor No.314】 Migrate vander into pir (#59573) * [Docathon][Fix System Message No.2] (#59295) * fix system message in website * fix * fix * fix * [xpu] Register fast_where and forbid pass if remove cast bool (#59594) Co-authored-by: newway <liuwei345@gmail.com> * fix behavior of put_along_axis and take_along_axis 易用性提升No.43 (#59163) * fix behavior of put_along_axis and take_along_axis * fix error * fix take_along_axis used in stat * update * fix build error * add test for error * add param broadcast * use origin example * add param include_self * update param name * modify ut * update test case * add error UT * update * [PIR & Inference] Add fused_weight_only_linear_pass (#59366) * [Inference]Add matmul_to_weight_only_linear_pass * fix test and rename pass * fix the comment of test * fix ci * fix: fix test * refactor: refactor pass and test * refactor: refactor pass * refactor: add fp16 test * refactor: refactor pass * refactor: refactor the opt_level * fix: fix typo * fix: fix ci compile error when without gpu * refactor: refactor pass and test * fix: fix conflict * fix: fix conflict * refactor: refactor opt_level in pass_test to 4 * docs: 增加 docstring 内容丰富中英文文档 (#59271) * docs(paddle.lr): 丰富 docstring 内容增加 class: LRScheduler 中文文档中介绍的 17 种策略至 docstring * docs(paddle.vision.transforms): 增加docstring更具体的示例修改 RandomHorizontalFlip 和 RandomVerticalFlip docstring 的示例代码 * [Paddle Inference] modify a check statement in memory_optimize_pass.cc (#59638) [Paddle Inference] modify a check statement in memory_optimize_pass.cc * [auto parallel] fix pp reshard (#59598) * [Paddle-Inference] GQA support fix mmha bug (#59351) [Paddle-Inference] GQA support fix mmha bug * 【pir】deal with if build stop gradient (#59585) * merge * add stop gradient * comment * [PIR & Inference] Add conv2dAddPass and conv2dAddActPass and conv2dAdd2ActPass (#59391) * add conv2d_add_fuse_pass * add all conv2d_fuse_pass and Modify passtest * bug fix * code style * code style * code style * bug fix * code style * code style * add test for new PassPattern * [Auto Parallel]Fix coverage in distributed mode (#59560) * test framework supports to_static and prim * test coverage * fix coverage * support distribute coverage --------- Co-authored-by: cyber-pioneer <chenzhuo@tju.edu.cn> * rewrite master weight for amp training (#59052) * rewrite master weight for amp training * some optimizers does not support master weight * cinn(dynamic): support run exp sub subgraph with dynamic shape graph (#59640) 修改broadcast的compute，使得output shape和input shape 一致的计算支持动态形状联调bucket机制，在不进行op schedule、group schedule的情况下可以跑通流程增加exp sub动态形状的子图单测。 * fix (#59589) * 【PIR API adaptor No.253、310】Migrate cumulative_trapezoid，trapezoid into pir (#59481) * 【PIR API adaptor No.238、239、240、241】 Migrate nn.initializer.XavierInitializer, nn.initializer.MSRAInitializer into pir (#59419) * 【PIR API adaptor No.261、273、283、285、286、313、315】 Migrate is_tensor/median/nanmean/nansum/neg/Unflatten/var into pir (#59509) * Add a pass to insert QDQ nodes before skip connection (#59009) * [PIR] Translate TensorArray Related Ops (#59633) * translate tensor array related ops and adapt thier executions * fix * fix * fix * to trigger CI * fix * fix for windwos bug * fix jit_setitem * test * update dygraph auto_parallel en API docs. (#59557) * 【auto parallel】llama attention 子图验证 (#59491) * auto parallel:llma attention and mlp * llama mlp、attention dp + mp * remove log * skip test * polish * polish * polish * [Cmake 治理] Move DDim etc. to common (#59105) * fix conflict * exception * kunlun ci * WIN_CI * setup.py * bug_fix * hash * auto_code_gen_WIN_CI * inference_CI * use_common_enforce * delete pir_enforce * delete_error * change_cmake * conflict * cmake * mac_CI * inference_copy * delete_pybind_common * paddle_test * split ddim constructor * cc_test * use cinn::common * copy_infer * delete_layer_test_new * bug_fix * infer * fix inference bug * conflict --------- Co-authored-by: winter-wang <1030748926@qq.com> * [Fix UT] fused_weight_only_linear_pass unittest modify (#59651) * unittest fix * code style * code style * [PIR] Add check for If grad test (#59590) * support lower to kernel for if_grad op * add PD_DECLARE_KERNEL * add debug * add precision test for if_op_test --------- Co-authored-by: zhangbo9674 <zhangbo54@baidu.com> * [PIR]Gen check DataType (#59354) * [Auto Parallel] Update Gradient Synchronization in Static Mode (#59057) * completion bw partial * debug * bugfix * insert param grad allreduce by partial * reorder allreduce for opt * fix typoes * add grad sync unitest * sp unitest * fixed unitest * [Paddle-TRT] custom operator support generating plugin automatically (#58976) * [Paddle-TRT] custom operator support generating plugin automatically * [AutoParallel][PIR] Support new ir for the visualize tool (#59195) * merge from openvino master * add InterpreterRunTime() to record interpreter's run time * add profiler helper static to produce json file * add color map and support perfetto format * recover codes * control include env for gpu_timer.h * fix logic for profiler_helper_static.py * fix build error * fix build error * recover thirdparty * add flag control: not support new ir now * set auto_parallel_profiler flag to false * fix * add auto_parallel_profiler as command parameter * fix value name * support gettimeofday for win env * fix win build error * fix win build error * use job_type_to_id * Fixed repeatedly timing the same stream * add step line for timeline * add step timeline and fix logic when job overlap * update time record logic * fix bug when start profile start from none zero step * fix note * remove FLAGS_auto_parallel_profiler * use run config instead FLAGS_auto_parallelxx * fix color map logic * fix color map logic * fix bug when log step does not start from 0 * fix * fix * don't use set_enable_auto_parallel_profiler * fix bug * disable auto_parallel_profiler when not open flag by command line * fix bug * remove resettime * fix build bug * fix * remove set enable * fix build error * fix build error * fix build error * fix ci error * fix * fix run error * fix * fix * fix calculate_stream_timer logic * remove fluid head * fix build error * set default value for enable_job_schedule_profiler * support new ir * fix is_communication_op logic * fix * fix build error * recover IsCommunicationOp * fix code_style * [CINN]Refine StaticShapeGroupScheduler code while learning logic (#59540) * [CINN]Refine StaticShapeGroupScheduler code while learning logic * fix comment * [Dy2St] Run PT in SOT mode only (#59658) * fix clang-tidy modernize-use-nullptr error (#59626) * [CodeStyle][ruff] clean some F401 step: 5 (#59576) * Enhanced RNG State Management with Index-Based Control for Graph-Safe Tensor Parallelism (#58859) * allow multiple rng state in generator * fix get_rng_state * Disable test for coverage cuda12 (#59556) * Disable test for coverage cuda12 * Fix * fix cmake * fix cmake * fix dist test * fix * fix * [Paddle-TRT] Add size op convert (#59563) * [Paddle-TRT] Add size op convert * [PIR]Open more PIR UTs (#59657) * [CINN] Make Resize Buffer Safer (#59014) Make Resize Buffer Safer, the old buffer resize didn't consider load, current we add support for it This PR also contain some code of safer UpdateBufferAxis of #59209 We will also clean it in the 59209 PR * [PIR]Fix nansum fp16 ut (#59666) * Polish the error message and check for flash_attn. (#58345) * Polish codes of flash_attn. * Add more log for debugging. * Allow dq, qk, or dv to be nullptr in flash_attn_grad. * Use temporary tensor when k or v does not have gradient and add unittest. * Add skipIf in unitttest. * add md5sum for tensor (#59606) * change_cc_test_old_f (#59619) * [Dy2St] Add `enable_to_static_guard` for dy2st uts (#59670) * Fix block_idx bug for auto parallel (#59596) * Fix block_idx bug for auto parallel * Fix typos * fix (#59645) * support_windows_cuda12 (#59665) * fix assign kernel (#59609) * add backward infer log (#59543) * fix bug (#58400) * [PIR] Add Three OPs with ReifyReturnTypeShapes (#58368) * Add ReifyReturnTypeShapes * Fix UT & fix op output & DimOfShapedTypeOpInterfacePattern * Add some to do * Alias DDim in phi (#59671) * [3/4] CUDNNv8 ResNet Fusion: Add fused_donv_drelu_dbn OP (#58986) * Rename output * Add fused_dconv_drelu_dbn_op * Add to CI test * Review changes * fix typos (#59679) * fix typos, test=document_fix * fix typos, test=document_fix * [PIR]Choose op by value type in PIR apis (#59605) * [Add] test atleast_xd pir backward (#59365) * [Change] keep tensor from input * [Change] atleast input for pri * [Change] test for pir * [Change] pir grad from z to x * fix test_decayed_adagrad_op (#59486) * fix sharding stage3 main_grad bug (#59611) * [Dy2St] pir dy2st unittest verification - Part 13 (#59517) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> * [CodeStyle][ruff] clean some F401 step: 6 (#59584) * clean F401 * fix * clean * RollBACK `python/paddle/base/__init__.py` * RollBACK `python/paddle/__init__.py` * rollback * [XPU] add some bf16 ops and update xdnn (#59653) * [PIR]support set value attribute by value. (#59656) * [AutoParallel] complete chunk_id attr in backward&update phase (#59522) * [AutoParallel] complete chunk_id attr in backward&update phase * Update backward.py * update fill_constant complete * update complete chunk_id * complete loss_grad_op * fix complete first grad op * [Dy2St] Remove duplicate dy2st resnet test (#59492) * [auto parallel] stack support 0d tensor (#59655) * Wint8 gemm and gemv opt (#59291) * fpAintB split-k * workspace * fix error * just_for_llama13b_bsz64-128 * llama13 opt * fix scale type of weight ony quant * draft gemv batched * accuracy fix * m size dispatch for gemv and gemm * fit dispatch * refine gemv * remove useless kernel * refine * fix bug for split-k-limit * fix bug for half scale * weight quant kernel fit for half scale * fix bf16 compile * fix sm70 autogen * fix sm70 compile error * fix code style * update * update * code-style * code-style * windows compile fix * code-style * fix merge bug --------- Co-authored-by: wwbitejotunn <wwbitejotunn@outlook.com> * support reduce_min reduce_pro mod flood_div ops (#59650) * [Dy2St] `enable_to_static_guard` 推全 6-15 (#59691) * [Dy2St] pir dy2st unittest verification - Part 14 (#59546) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> * [compiler opt]change_cc_test_old (#59620) * update * update * update * Update CMakeLists.txt * Update CMakeLists.txt * [CINN] Strong constraint branch support dynamic shape (#59309) * Strong Constraint Branch * NoInlineTranslator (#84) * Adapt adt to pir * Move FLAGS_cinn_enable_map_expr_schedule location * Apply new group schedule * Remove useless log * Remove adt unittest * Solve merge conflicts * Fix typo * Fix merge conflicts * Add unit test * Fix cmake * Add test_cinn_sub_graph_map_expr * transfer origin code to develop * Fully support ShapeDialect * TranslateDimExpr * Fix dim_expr_simplifier * Generate ir with symbolic * Solve compile error * SymbolicDim to SymbolicDimOp when Translate ir * Solve some conflict * Solve conflict * ShapeAnalysis * fix compile error * Disable dynamic shape * Add scale generate_equation * Add cpp unittest * Add cpp unittest * Change VLOG priority * Clean IndexDot * Unittest * Cancel SimplifyDotBI * UniqueId ResetSeqNumber * input_spec and kDynamic * map_expr_test * [Auto Parallel] Fix run scripts for hybrid unittests (#59701) * Fix program_translator bug for subblock (#59724) * 【AutoParallel】Promote fuselinear pass in auto-parallel (#59188) * add fused_linear_promotion pass * add promote_fusedlinear pass * support sp without dp * delete some log * fix bug in process_mesh * add sp+dp support * fix bug when dp_group is None * modify code according to review * add unit_test * add unit_test * fix the test * 【PIR / Dy2static】Fix pir test 3 (#59696) --------- Co-authored-by: chenzhiyang <1792266893@qq.com> * [SOT] Add `paddle.metric` to paddle API (#59698) * [Dy2St] Run original partial program call to avoid CUDA error 700 (#59687) * fix test_activation_op (#59618) * Fix sot eval and test len (#59408) --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> Co-authored-by: zhangbo9674 <zhangbo54@baidu.com> * Add introduction about Open Source Community to Readme (#59704) * Don't Merge * make conflict * reset * add community * Update communication section in README.md --------- Co-authored-by: jzhang533 <jzhang533@gmail.com> * 【auto parallel】Llama decoder 子图验证 (#59580) * auto parallel:llma attention and mlp * llama mlp、attention dp + mp * remove log * remove log * polish * polish * polish * polish time out * polish time out * 【PIR API adaptor No.161、162】Migrate `paddle.vision.ops.nms` `paddle.nn.functional.one_hot` into pir (#58735) * 【PIR API adaptor No.28】Migrate `paddle.vision.ops.box_coder` into pir (#59616) * [PHI]Open PHI shared Lib by default (#59345) * open phi shared default * format code * update code * fix bugs * open phi shared * fix test_lstm for pir (#59608) * [Semi-auto]Add srp in dist_tensor (#59683) * add srp in disttensor * add srp in disttensor * add srp in disttensor * add srp in disttensor * add srp in disttensor * [OneDNN] Optimize fused elementwise kernel (#59663) * [PIR] Relax the restrictions of IF Verify Region (#59689) * fix * Fix program_translator bug for subblock --------- Co-authored-by: chenruibiao <chenruibiao@baidu.com> * Merge into develop part-5 (#59644) * part-3 cherry from: add check for cembedding (#55621) * part-3 fix cherry from: add check for cembedding * part-3 fix c_embedding * fix test_gpt_with_pir caused by pir * part-3 cherry from: [Distributed] Support dp/sharding overlap in virtual pp (#55651) * Add virtual pp and dp overlap * add sharding/dp overlap * add dp/vpp overlap * fix code * fix log * part-3 cherry from: [cherry-pick] Integration flash attention 2 (#56015) * [FlashAttn] add flash randomness control (#52902) * add flash randomness control * fix VLOG undefied * [WIP] Integration flash attention 2 (#55758) * Work for fa-2 padded fwd. Code to be cleaned. * Work for fa2 unpadded fwd. * Work for padded-bwd, dk get small diff on np.random.seed(0) * Anyway I pass paddle's utest, except return softmax without dropout. * Clean code. * Modify interface. * Clean code and add some check. * Easy compile for dev. * Fix ci. * Fix ci-build. * Add std c++17 option again. * Limit max job when compiling fa2. * Remove const_cast * Add fwd params, to be cleaned. * Clean code. * Add bwd params. * Clean code. * Add enforce. * Use v2.0.4 * Pass RNG state to fa2 capi * Fix review. * Add assert * Skip compile for sm less than 80. --------- Co-authored-by: Chitsing KUI <kuizhiqing@msn.com> * part-4 cherry from: fix codestyle (#56066) * part-4 cherry from(no change): Add assert for static and other plateform (#56044) * part-4 cherry-pick from: dp and sharding coexist (#56096) * dp and sharding coexist * dp * part-4 cherry from: [Distributed] Add debug information for processgroupnccl (#56441) * add debug information * fix log * fix log * add detach for pp * part-4 cherry from: [BugFix]Fix bug in paddle.device.cdua.synchronize() (#56451) * fix bug in synchronize * fix bug in synchronize * part-4 cherry from: add fused gradient (#57048) * part-4 cherry from: [Distribtued] add eager_communication_connection for eager mode in nccl (#57517) * add eager_nccl_connection * add eager_connection * add eager_connection * part-4 cherry from: Add auto growth allocator for CUDA pinned allocator (#57625) * fix h2d bandwidth * remove useless flags * fix cherrry pick #56066 * part-5 cherry from: Add allocation debug FLAGS (#57797) * Add allocation debug FLAGS * add sync after value set * refine flags * part-5 cherry from: fix softmax backward (#57971) * part-5 cherry from: [Distributed]Optimize memory in processgroup (#58299) * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp * fix * fix comments * fix kunlun compatibility issues * fix test_fused_rotary_position_embedding.py * fix allocator.h * tinyfix * fix conflicts * fix new ir translator c_embedding failure --------- Co-authored-by: ShenLiang <1422485404@qq.com> Co-authored-by: umiswing <umiswing@foxmail.com> Co-authored-by: Chitsing KUI <kuizhiqing@msn.com> Co-authored-by: niuliling123 <51102941+niuliling123@users.noreply.github.com> Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com> Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com> * 【Hackathon 5th No.112】move read_file to phi - part (#59359) * move read_file to phi, but it run in dygraph, it may cause some bug * remove the static gen * fix bug * fix the code stype * move the file to cpu * remove the #include * [PIR] Add artificial instruction: builtin_combine (#59669) * fix * fix * fix * [auto parallel] Dist tensor set value (#59706) * fix reshape and reshard (#59688) * [Docathon][Fix System Message No.12] test to fix (#59445) * Fix pir comiler name id bug (#59642) * fix pir compiler name id bug * remove usless code * remove code * fix bug * [Dy2St] pir dy2st unittest verification - Part 12 (#59378) * add `test_legacy_and_pir_exe_and_pir_api` * update * add `test_tensor_memcpy_on_cpu` and gpu * add debug info to yolov3 * fix test_declarative.TestInputSpec * update yolov3 * judge params by name * update test_declarative * restore test_yolov3 * fix place test * `assertTrue` -> `assertIn` * revert test_tensor_memcpy_on_cpu * skip api check gen for `assign_out_` --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> * Remove `:=` and update classifiers (#59733) * add cross entropy test case (#59693) * [Dy2static] Fix save problem in dy2static (#59709) * rename conv2d_fusion op to fused_conv2d_add_act (#59431) * [PIR] add operand_index api for Operation and fix cf pass (#59738) * [PIR] add operand_index api for Operation and fix cf pass * update * [PIR] fix python cond api error in test_ifelse. (#59708) * cinn(cmake): fix cmake error in dynamic (#59711) * cinn(cmake): fix cmake error in dynamic * cinn(cmake): move symolic subgraph to a subdirectory * fix typos, test=document_fix (#59754) * 【Hackathon 5th No.27】为 Paddle 新增 select_scatter API -part (#59343) * support select_scatter op * fix example code * fix sc * update example * remove unused files * add name * fix conflict * update * remove * update * add type * update type * [SOT]Fix Train/Eval Switch BUG in SOT (#59747) * [SOT]Fix Train/Eval Switch BUG in SOT * rm usless code * add pp bug report (#59762) * add profiler_range (#59634) * add profiler_range * add test cases and fix logic * Update test_job_schedule_profiler_range.py * Update CMakeLists.txt * Update CMakeLists.txt * add test case * [PIR+CINN]Support Adapative Parse and Check Feed/Fetch in SubGraph Exporter (#59749) * [Prim][PIR] stack prim sink (#59713) * stack sink * prim stack sink * stack sink * [CINN] Fix inline_translator_test compile error (#59737) * Fix inline_translator_test * cinncore -> absl * 【PIR API adaptor No.266、269】 Migrate ldexp, logaddexp into pir (#59582) * 【PIR API adaptor No.80、81】 Migrate fused_layer_norm and FusedDropoutAdd into pir (#59420) * [SOT] fix sot call locals (#59710) * [PIR] restore AST+PT test and refine code (#59668) * [Dy2St] Run PT in SOT mode only * restore legacy ir test and refine code * fix * fix * fix * fix * fix --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com> * [Auto_Parallel] update path lists for pir (#59757) * [auto parallel] CrossMeshReshard for: p2r, p2s, r2p, r2s, s2p, s2r, s2s. (#59758) * Remove skip_transform of index_put (#59664) * add type promotion logic for eager between tensor and tensor (#59518) * add eager T+T logic. * remove useless file. * remove useless line. * fix * update * fix note. * mv common logic to common dir. * fix * remove deal for int. * remove int. * only for complie * ignore other type promotion for now. * new eager logic. * fix bug, add where. * fix * add dtype check, warnning, rename .h * add warnning * add more log. * bug fix * fix by comment, make logic of eager_gen more readable. * [auto parallel] embedding subgraph test (#59681) * fix bug in xpu pp (#59753) * fix elementwise inferspmd (#59707) * 【Dy2static / PIR】fix apply pass + bn accuracy problem + test_resnet.py (#59774) * fix order of static backward * fix some error in topo order * remove useless breakpoint * fix * fix * fix * fix * fix * adjustly ir backward prune routine. * fix * fix cross_entropy_with_softmax vjp bug * fix pre-commit! * fix * fix 3 unittest * fix code format * fix test_tensor_memcpy_on_gpu.py * fix test_partial_program.py * fix test_ptb_lm_v2 * [PIR]Using inplace batch norm in PIR * fix apply pass error * fix bn problem. * fix test_resnet.py uniitest --------- Co-authored-by: chenzhiyang <1792266893@qq.com> Co-authored-by: 0x45f <wangzhen45@baidu.com> * optimize set_value (#59425) * optimize set_value * fix none shape * Distributed SaveLoad implementation for semi-auto strategy (#59659) * exclude xpu * demo of running dygraph distributed save load * support save cross mesh state_dict * polish * fix compute overlap bug * test save load in dp_mp unittest * fix get local file bug and test * delete useless files, and rename var * polish * format codes * test use_dist * fix test * info to debug * fix test * fix * fix coverage ci * fix docstring codes * rename and codestyle * get rid of use_dist argument * fix copyright * polish doc * polish * polish * use tmp file path * [AutoParallel] add chunk_id attr for dist_op (#59719) * [AutoParallel] add chunk_id attr for dist_op * update utils funcs * update dist ops * fix dist_ctx * fix dist_default * add silu as dist_elemwise * 【pir】 modify 5/6 case of test_cond.py with append_backward (#59732) * first modify * clear modify * modify if_grad2 * append_full_like * add new test * modify add_n --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com> Co-authored-by: 6clc <chaoliu.lc@foxmail.com> Co-authored-by: lzy <569782149@qq.com> Co-authored-by: wanghuancoder <wanghuan29@baidu.com> Co-authored-by: risemeup1 <62429225+risemeup1@users.noreply.github.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: Aurelius84 <zhangliujie@baidu.com> Co-authored-by: Jinyuan Huang <88757735+BernieHuang2008@users.noreply.github.com> Co-authored-by: hong <43953930+phlrain@users.noreply.github.com> Co-authored-by: RuohengMa <120699764+RuohengMa@users.noreply.github.com> Co-authored-by: Ghost Screaming <mofengshenjieII@163.com> Co-authored-by: xiongkun <xiongkun03@baidu.com> Co-authored-by: chenzhiyang <1792266893@qq.com> Co-authored-by: SigureMo <sigure.qaq@gmail.com> Co-authored-by: Leo Chen <chenqiuliang@baidu.com> Co-authored-by: chen2016013 <111894720+chen2016013@users.noreply.github.com> Co-authored-by: zhangbo9674 <zhangbo54@baidu.com> Co-authored-by: XiaociZhang <zhangxiaoci@baidu.com> Co-authored-by: lijin23 <41257772+lj970926@users.noreply.github.com> Co-authored-by: zhink <33270771+zhink@users.noreply.github.com> Co-authored-by: cyberslack_lee <luhputu0815@gmail.com> Co-authored-by: PommesPeter <54879512+PommesPeter@users.noreply.github.com> Co-authored-by: Yiqun Liu <Xreki@users.noreply.github.com> Co-authored-by: BiynXu <62832681+BiynXu@users.noreply.github.com> Co-authored-by: xiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com> Co-authored-by: Yichen Zhang <32740647+pkuzyc@users.noreply.github.com> Co-authored-by: xingmingyyj <135400902+xingmingyyj@users.noreply.github.com> Co-authored-by: ceci3 <ceci3@users.noreply.github.com> Co-authored-by: Charles-hit <56987902+Charles-hit@users.noreply.github.com> Co-authored-by: cyber-pioneer <chenzhuo@tju.edu.cn> Co-authored-by: zhaoyingli <86812880+zhaoyinglia@users.noreply.github.com> Co-authored-by: coco <69197635+cocoshe@users.noreply.github.com> Co-authored-by: zachary sun <70642955+sunzhongkai588@users.noreply.github.com> Co-authored-by: Chenghao Liu <chenghao1652@126.com> Co-authored-by: Travis-Lee <lixiang.fr@hotmail.com> Co-authored-by: Liujie0926 <44688141+Liujie0926@users.noreply.github.com> Co-authored-by: YUNSHEN XIE <1084314248@qq.com> Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com> Co-authored-by: Yuang Liu <liuyuang@baidu.com> Co-authored-by: NetPunk <69072522+Patrick-Star125@users.noreply.github.com> Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com> Co-authored-by: chen2016013 <cx2016013@163.com> Co-authored-by: Zhang Zheng <32410583+ZzSean@users.noreply.github.com> Co-authored-by: winter-wang <78149749+winter-wang@users.noreply.github.com> Co-authored-by: HongyuJia <jiahongyu@baidu.com> Co-authored-by: Winters Montagne <118546135+WintersMontagne10335@users.noreply.github.com> Co-authored-by: zhouzj <41366441+zzjjay@users.noreply.github.com> Co-authored-by: houj04 <35131887+houj04@users.noreply.github.com> Co-authored-by: Xinyi_LI <xinyi1.li@intel.com> Co-authored-by: csy0225 <78470701+csy0225@users.noreply.github.com> Co-authored-by: 张春乔 <83450930+Liyulingyue@users.noreply.github.com> Co-authored-by: zbt78 <1095497213@qq.com> Co-authored-by: xiaoye <50870160+xiaoyewww@users.noreply.github.com> Co-authored-by: ooo oo <106524776+ooooo-create@users.noreply.github.com> Co-authored-by: kevin <chengyf112@gmail.com> Co-authored-by: Zhang,Lirong <56445728+zhanglirong1999@users.noreply.github.com> Co-authored-by: Wen Sun <35923278+HermitSun@users.noreply.github.com> Co-authored-by: Liuyinfeng <30849840+gitliuyf@users.noreply.github.com> Co-authored-by: newway <liuwei345@gmail.com> Co-authored-by: YibLiu <68105073+YibinLiu666@users.noreply.github.com> Co-authored-by: Longzhi Wang <583087864@qq.com> Co-authored-by: HankYang <97599656+Hhankyangg@users.noreply.github.com> Co-authored-by: 周周周 <39978853+zhoutianzi666@users.noreply.github.com> Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com> Co-authored-by: Zhang Ting <zhangting_2017@163.com> Co-authored-by: Lu Qi <61354321+MarioLulab@users.noreply.github.com> Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com> Co-authored-by: kangguangli <kangguangli@hotmail.com> Co-authored-by: wuhuachaocoding <77733235+wuhuachaocoding@users.noreply.github.com> Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com> Co-authored-by: Bo Zhang <105368690+zhangbopd@users.noreply.github.com> Co-authored-by: winter-wang <1030748926@qq.com> Co-authored-by: Zhan Rongrui <46243324+zrr1999@users.noreply.github.com> Co-authored-by: JZ-LIANG <jianzhongliang10@gmail.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com> Co-authored-by: Sonder <55493212+AndSonder@users.noreply.github.com> Co-authored-by: Zhenghai Zhang <65210872+ccsuzzh@users.noreply.github.com> Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com> Co-authored-by: Frank Lin <eee4017@gmail.com> Co-authored-by: tianshuo78520a <707759223@qq.com> Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com> Co-authored-by: Huihuang Zheng <zhhsplendid@163.com> Co-authored-by: ShenLiang <1422485404@qq.com> Co-authored-by: Galaxy1458 <55453380+Galaxy1458@users.noreply.github.com> Co-authored-by: Ruibiao Chen <chenruibiao@baidu.com> Co-authored-by: xuxinyi389 <104957571+xuxinyi389@users.noreply.github.com> Co-authored-by: LiYuRio <63526175+LiYuRio@users.noreply.github.com> Co-authored-by: Tian Zheng <tizheng@nvidia.com> Co-authored-by: Wang Xin <xinwang614@gmail.com> Co-authored-by: megemini <megemini@outlook.com> Co-authored-by: tianhaodongbd <137985359+tianhaodongbd@users.noreply.github.com> Co-authored-by: Wang Bojun <105858416+wwbitejotunn@users.noreply.github.com> Co-authored-by: wwbitejotunn <wwbitejotunn@outlook.com> Co-authored-by: Haohongxiang <86215757+haohongxiang@users.noreply.github.com> Co-authored-by: lzydev <lizhiyu02@baidu.com> Co-authored-by: Ligoml <39876205+Ligoml@users.noreply.github.com> Co-authored-by: jzhang533 <jzhang533@gmail.com> Co-authored-by: YuanRisheng <yuanrisheng@baidu.com> Co-authored-by: wentao yu <yuwentao126@126.com> Co-authored-by: umiswing <umiswing@foxmail.com> Co-authored-by: Chitsing KUI <kuizhiqing@msn.com> Co-authored-by: niuliling123 <51102941+niuliling123@users.noreply.github.com> Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: Zero Rains <linjunlu@zerorains.top> Co-authored-by: feifei-111 <2364819892@qq.com> Co-authored-by: zxcd <228587199@qq.com> Co-authored-by: 0x45f <wangzhen45@baidu.com> Co-authored-by: Difer <707065510@qq.com> Co-authored-by: pangengzheng <117730991+pangengzheng@users.noreply.github.com>

x

test/dygraph_to_static/test_function_spec.py

python/paddle/tensor/creation.py

DrRyanHuang added 2 commits November 29, 2023 17:36

258、295、299、307

1439c99

add test glu

feed532

DrRyanHuang commented Nov 30, 2023

View reviewed changes

test/legacy_test/test_zero_dim_tensor.py Outdated Show resolved Hide resolved

rm test@

e8a6869

luotao1 mentioned this pull request Nov 30, 2023

新IR Python API适配升级 #58067

Closed

0x45f reviewed Dec 4, 2023

View reviewed changes

python/paddle/tensor/attribute.py Outdated Show resolved Hide resolved

python/paddle/tensor/math.py Outdated Show resolved Hide resolved

DrRyanHuang and others added 2 commits December 4, 2023 11:22

OpResult => Value in pir

db24908

Co-authored-by: WangZhen <23097963+0x45f@users.noreply.github.com>

fix codestyle

ae9f0e8

DrRyanHuang requested a review from 0x45f December 5, 2023 04:59

DrRyanHuang mentioned this pull request Dec 6, 2023

[WeeklyReports] 2023.11.22~2023.12.05 周报汇总 PFCCLab/Camp#102

Closed

20 tasks

DrRyanHuang and others added 7 commits December 7, 2023 05:28

add sgn support complex

f09ada5

add complex unitest

0683f1d

Merge pull request #24 from PaddlePaddle/develop

e4f50b5

x

add test_sgn

c9db0c7

Merge branch 'develop' into DDD

6f06dbc

del tuple

422032e

DrRyanHuang commented Dec 7, 2023

View reviewed changes

test/dygraph_to_static/test_function_spec.py Show resolved Hide resolved

YuanRisheng reviewed Dec 8, 2023

View reviewed changes

python/paddle/tensor/creation.py Outdated Show resolved Hide resolved

DrRyanHuang added 2 commits December 9, 2023 07:12

revert creation.py

2b2932e

rm dtype

1a054fe

DrRyanHuang requested a review from YuanRisheng December 10, 2023 17:52

0x45f approved these changes Dec 11, 2023

View reviewed changes

0x45f merged commit 217cc54 into PaddlePaddle:develop Dec 11, 2023
29 checks passed

DrRyanHuang deleted the DDD branch December 11, 2023 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【PIR API adaptor No.258、295、299、307】 Migrate glu/rank/sgn/take into pir #59535

【PIR API adaptor No.258、295、299、307】 Migrate glu/rank/sgn/take into pir #59535

DrRyanHuang commented Nov 29, 2023 •

edited

Loading

0x45f commented Dec 5, 2023 •

edited

Loading

【PIR API adaptor No.258、295、299、307】 Migrate glu/rank/sgn/take into pir #59535

【PIR API adaptor No.258、295、299、307】 Migrate glu/rank/sgn/take into pir #59535

Conversation

DrRyanHuang commented Nov 29, 2023 • edited Loading

PR types

PR changes

Description

0x45f commented Dec 5, 2023 • edited Loading

DrRyanHuang commented Nov 29, 2023 •

edited

Loading

0x45f commented Dec 5, 2023 •

edited

Loading