update paddlepaddle #9

AshburnLee · 2021-04-07T12:14:15Z

PR types

PR changes

Describe

* polish tensor pipeline. test=develop

* fix one error massage * fix a error message * new fix three error messages * new fix three error messages * new fix some error * new fix one error message

* update, test=develop

* [Parallel UT]improve Parallel UT level on Windows/Linux * [Parallel UT]improve Parallel UT level on Windows/Linux * [Parallel UT]Improve Parallel UT level on Windows/Linux * [Parallel UT]Improve Parallel UT level on Windows/Linux * fix CI

* delete include framework.pb.h * fix error * delete fluid_train

* fix en doc for emb, test=document_fix; Change-Id: I4757e67caacd7189f068493ed45a7445f87ffb40

) * refactor and simplify hook design * fix reducer add hook error * add Tensor.register_hook basic impl * refine prepare data impl * revert prepare data change * support register_hook for Tensor * add hook test in model * polish tests and doc example * fix double grad test failed * remove reduce hook func * fix set empty error * polish code by comments * change reduce_hook to mutable_hook * remove useless tmp_ins * fix shape code format error * fix shape code format error

* new group * ci compatible fix * assert nccl

* add anchor generator op plugin * add anchor generator unit_test * remove dbg info * remove redundant line * replace assertion with paddle enforce * dynamic plugin replaces assertion with paddle enforce * anchor generator support dynamic shape on spatial axis * anchor generator test with fp16, dynamic shape * add anchor generator test all * add back main * reduce test input size to not exceed the timelimit of ci * change super to InferencePassTest for python2 compatibility * reuse paddle operator anchor generator * move creator construct to header with default * add cuda ifdef * reduce line * change super to InferencePassTest for python2 compatibility * fix anchor generator fp16 serialize setting * split unittest from test_all * restrict anchor generator input format before version 7234 * anchor generator only support greater than trt7.1 * change min_graph_size to 2 * min_graph size to 3 if dynamic shape * reduce dynamic shape size to avoid trt search tactic too long to exceed time limit * remove anchor from fetch list * anchor generator support all trt version * fix memory not allocated but if serialized

* upgrade vlog * train from dataset fetch optimize

* add custom init grad for backward function * add custom init grad for backward function * handle when the grad_tensor is none * handle when the grad_tensor is none * fix the args type error on windows platform * modify the args order and doc * format code * add grad_tensor to xpu * modify the grad_tensor type check * add paddle.backward api to support multi tensors gradient compute * add paddle.backward api to support multi tensors gradient compute * add paddle.atuograd module and backward api * change tensor.backward func args * modify tensor backward api * remove create_graph intputs args * add doc and examplex code for backward api * when have the same tensor, throw error * modify test Init func args * modify the execute.Init func args in test files * add paddle.autograd package in setup.py.in * modify error msg, remove _run_backward method in class Tensor * add test cases for backward api

* fix doc of MaxPool1D * fix doc * fix doc format error * dbg * fix doc * dbg doc format test=document_fix * fix format test=document_fix * test doc * remove - from doc * fix indent * remove space before bracket * dbg format * fix indent test=document_fix * remove new line * fix descrip of Shape test=document_fix * add description for default value test=document_fix * fix bug test=document_fix

* support control flow * supoort sync_parameters_buffers * fix the bug of sparse embedding

* add leaky_relu forward and backward in activation_op.cu

* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions Co-authored-by: Huang Zhengjie <270018958@qq.com> Co-authored-by: Weiyue Su <weiyue.su@gmail.com> Co-authored-by: suweiyue <suweiyue@baidu.com> Co-authored-by: luobin06 <luobin06@baidu.com> Co-authored-by: liweibin02 <liweibin02@baidu.com>

* update trt engine addplugin name. * update

* support save/load single tensor * compatibility modification according to unnittest * Some python2.7 don't have 'copyreg' modules * Handle a syntax error. * Dealing with compatibility problems on Mac. * Dealing with compatibility problems on Mac. * edit unittest to improve coverage. * Modify the code according to the review comments * Reduce redundant code. * support for static graph loading dygraph state_dict * edit code according to CI * edit unittest * edit unnittest * delete redundant file * edit code according to Comments * edit english doc * edit english doc * edit English DOC. * get/set_tensor->get/set_value; return_numpy=False * get/set_tensor->get/set_value; return_numpy=False * edit unnittest * edit unnittest * polish code.

#31884)

* use busybox run test on windows openblas * fix error * fix disable_quick and nightly lable issue * add retry on windows openblas * fix bug * use one file to run cpu and gpu tests * fix with grep warning * fix syntax error * change run_unittest to run_unittest_gpu * Update run_unittests.sh fix error

* support hyparallel, add topology * fix utest

* fix two error message * fix two error message * fix error * fix error * fix error * fix error

* fix yolobox teller condition * fix cuda double free bug

* fix test of affine_grid with rocm * fix test of affine_grid with rocm

* add PullSparseValue for pull sparse * fix bug for PullSparseValue * add test mode in lookuptable * revert API change * add comment for is_training

* print build summary * print build summary * print build summary * print build summary

…31240) * update develop whl package name * distingush cpu and gpu name * fix ref_gcc * change whl name * upgrade gcc 4.8 to 5.4 in ubuntu_dev * update gcc4.8 to 5.4 in centos * Upgrade pip from 18.0 to 20.0.1 * change 2.1.0_dev0 to 2.1.0.dev0 in gpu version

…x automatically (#31989) As the title

* improve performance of DepthwiseConv(NWHC)

* Ascend rc (#30483) * Fix compilcation on CANN20.1 and older (#30494) Fix compilcation on CANN20.1 and older * Add distribution supported (#30578) Add distribution supported * Build praser for Hcom* operators (#30627) Build praser for Hcom* operators * Pass device_ids info from launch to trainer. (#30632) Pass device_ids info from launch to trainer * Add Hccl program group (#30642) Add Hccl program group * Add startup bash files of test_ascend_group. (#30645) Add startup bash files of test_ascend_group * cleanup (#30646) cleanup test_ascend_group.py * [Feature] Build parser to support distributed training (#30658) [Feature] Build parser to support distributed training * fix compilation on ascend-20.1 (#30722) fix compilation on ascend-20.1 * Dev/fix ascend string (#30749) Dev/fix ascend string * code style (#30781) code style * Merge ascend_optimizer and ascend_parser. (#30776) Merge ascend_optimizer and ascend_parser. * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug (#30797) Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug * Add paddle ascend distribution training supported (#30796) Add paddle ascend distribution training supported * pass cxx_flags to gloo cmake (#30857) * Destroy session first. (#30954) Destroy session first. * merge * fix, test=develop * fix, test=develop * fix style, test=develop * fix, test=develop * fix * fix log fatal, test=develop * fix enforce style, test=develop * fix, test=develop * fix, test=develop * fix rccl, test=develop * fix test, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix node_num, test=develop * fix ids str, test=develop * fix ids str, test=develop * fix ids str, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix style code, test=develop * fix style code, test=develop * fix style code, test=develop * fix style code, test=develop Co-authored-by: hutuxian <hutuxian2011@sina.cn> Co-authored-by: gongweibao <weibao.gong@gmail.com> Co-authored-by: Void Main <voidmain1313113@gmail.com> Co-authored-by: Leo Chen <chenqiuliang@baidu.com> Co-authored-by: dingsiyu <18369187719@163.com> Co-authored-by: OleNet <olenet@126.com>

* added ut check on windows,notest,test=windows_ci * debug,notest,test=windows_ci * debug,notest,test=windows_ci * fix bug,notest,test=windows_ci * added ut check * test for new ut add on windows * test,notest,test=windows_ci * fix bug,notest,test=windows_ci * test * test * test * test,notest,test=windows_ci * test,notest,test=windows_ci * check added ut on windows * only fetch upstream develop * modified according comment * Update run_unittests.sh * Update run_unittests.sh

* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions * move graph files into a folder * code style change * remove graph operations from base table Co-authored-by: Huang Zhengjie <270018958@qq.com> Co-authored-by: Weiyue Su <weiyue.su@gmail.com> Co-authored-by: suweiyue <suweiyue@baidu.com> Co-authored-by: luobin06 <luobin06@baidu.com> Co-authored-by: liweibin02 <liweibin02@baidu.com> Co-authored-by: tangwei12 <tangwei12@baidu.com>

heavengate and others added 30 commits March 31, 2021 15:16

Polish tensor pipeline (#31701)

e973bd7

* polish tensor pipeline. test=develop

delete cuda9 code (#31883)

ea738dd

fix one error massage (#31904)

6f85e24

* fix one error massage * fix a error message * new fix three error messages * new fix three error messages * new fix some error * new fix one error message

Adjust pipeline optimizer for 3d parallelism (#31939)

695dd37

* update, test=develop

Delete legacy C++ training user-interface (#31949)

d5b5004

* delete include framework.pb.h * fix error * delete fluid_train

fix compilation error on rocm, test=develop (#31991)

eb3199f

fix en doc for emb (#31980)

6b74486

* fix en doc for emb, test=document_fix; Change-Id: I4757e67caacd7189f068493ed45a7445f87ffb40

new group (#31682)

0774159

* new group * ci compatible fix * assert nccl

Support uint8_t for fill_constant_op (#31911)

980227f

Optimize the perf of SameDimsAdd CUDA Kernel (#31872)

4acc87b

LOG CLEAN (#31819)

0589ed2

* upgrade vlog * train from dataset fetch optimize

remove useless code (#32001)

9c5d028

Support control flow in DataParallel (#31625)

8460698

* support control flow * supoort sync_parameters_buffers * fix the bug of sparse embedding

fix doc preblem (#32010)

1b6c1d3

fix use_softmax=False does not work, test=develop

68e7de2

[ROCM] fix depthwise conv failure on ROCM, test=develop (#31998)

a4b30a1

fix typo in spawn (#32017)

df5aff8

delete test_data_generator (#31987)

0e52cdf

fix random compile failed on windows (#32032)

0b42f48

add leaky_relu forward and backward in activation_op.cu (#31841)

4490e8a

* add leaky_relu forward and backward in activation_op.cu

[ROCM] fix softmax_with_cross_entropy_op (#31982)

9e06a64

update trt engine addplugin name. (#32018)

d918786

* update trt engine addplugin name. * update

update plugin creator name (#32021)

ed49b41

Add more ops to calculate output scales (#32036)

cd74b20

tianshuo78520a and others added 25 commits April 2, 2021 17:22

fix decorator in py2 (#32043)

bf10d56

[3D-Parallel:Sharding] Optimizations for supporting ERNIE 3.0 training (

69c874f

#31884)

delete temporary files (#32055)

36687d7

Optimize elementwise_add_grad op, test=develop (#32051)

1e52f32

[ROCM] fix the backward maxpool (#32030)

a3b08ba

[Hybrid Parallel] Add Topology for hybrid communicate (#32011)

2e82b6c

* support hyparallel, add topology * fix utest

fix two error message (#32039)

9e8f903

* fix two error message * fix two error message * fix error * fix error * fix error * fix error

remove pass restrictions for skip-ln pass (#32081)

6d6ea56

[PaddleTRT] Yolov3 bugfix (#32064)

b17e36a

* fix yolobox teller condition * fix cuda double free bug

fix test of affine_grid with rocm (#32047)

78af100

* fix test of affine_grid with rocm * fix test of affine_grid with rocm

optimize compilation of operators using eigen (#31851)

187bf41

fix fc doc (#32084)

a17c369

Del cudnn6 code2 (#31986)

b8b82b7

Struct SparseValue && Bug Fix (#31721)

a881b4d

* add PullSparseValue for pull sparse * fix bug for PullSparseValue * add test mode in lookuptable * revert API change * add comment for is_training

print build summary (#32110)

e625f88

* print build summary * print build summary * print build summary * print build summary

update the TraceLayer.save_inference_model method with add file suffi…

10af966

…x automatically (#31989) As the title

improve performance of DepthwiseConv(NHWC) (#31677)

363b25a

* improve performance of DepthwiseConv(NWHC)

[3D-parallelism] Hybrid Model Parallelism (#32074)

1e60a0c

bugfix for unit test test_segment_ops (#32116)

d91faf2

AshburnLee merged commit db9fc91 into AshburnLee:develop Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update paddlepaddle #9

update paddlepaddle #9

AshburnLee commented Apr 7, 2021

update paddlepaddle #9

update paddlepaddle #9

Conversation

AshburnLee commented Apr 7, 2021

PR types

PR changes

Describe