add fix op run order pass #34427

sneaxiy · 2021-07-27T09:08:07Z

PR types

Performance optimization

PR changes

APIs

Describe

Add fix_op_run_order pass.

MLPerf ResNet50模型优化。

使用nsys profile发现Paddle单机8卡AllReduce时间明显比MxNet慢。原因有2个：

MxNet一个batch只有2次AllReduce，一个FP32，一个FP16。而Paddle一个batch有8个AllReduce，4个FP32和4个FP16。本PR将FLAGS_fuse_parameter_memory_size=0的情况进行处理，将其处理成2个AllReduce，即跟MxNet一样，一个FP32，一个FP16。
MxNet多卡间的AllReduce步调较为一致（起AllReduce的时刻基本相同）。而Paddle多卡间的AllReduce步调不一致，导致AllReduce被拖慢。本PR加了一个fix_op_run_order_pass，固定多进程间的OP执行次序。

本PR经过上述两个改动后，MLPerf ResNet50模型单机8卡加速比从7.69提升至7.83。MxNet目前单机8卡加速比是7.92。

框架	单卡速度	单机8卡速度	加速比
MxNet	3349	26518	7.92
Paddle（优化前）	3009	23141	7.69
Paddle（优化后）	3009	23559	7.83

paddle-bot-old · 2021-07-27T09:08:19Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wangxicoding · 2021-07-29T08:07:05Z

paddle/fluid/framework/ir/multi_devices_graph_pass/fix_op_run_order_pass.cc

+      }
+    }
+
+    VLOG(10) << "Found unchanged OpDesc " << op_to_idx.size() << ", new OpDesc "


是node_to_idx.size()吧，op_to_idx是原始program的，可以再打印一个Origin OpDesc数量

wangxicoding · 2021-07-29T08:42:23Z

paddle/fluid/framework/ir/multi_devices_graph_pass/fix_op_run_order_pass.cc

+      auto &pending_ops = graph_view.PendingOps(cur_op);
+      tmp_ops.clear();
+      for (auto *pending_op : pending_ops) {
+        if (visited_ops.count(pending_op) > 0) {


理论上是不是不会出现被访问执行多次的情况啊，感觉加个断言保险点

这个visited_ops其实可以去掉，已去掉。

wangxicoding · 2021-07-29T09:00:55Z

paddle/fluid/framework/ir/multi_devices_graph_pass/fix_op_run_order_pass.cc

+    auto *prev_op = sorted_ops[i - 1];
+    auto *cur_op = sorted_ops[i];
+    auto *dep_var = new details::DummyVarHandle(graph->CreateControlDepVar());
+    graph->Get<details::GraphDepVars>(details::kGraphDepVars).emplace(dep_var);


提个小小的建议，可以判断一下prev_op和cur_op是否已经有var依赖了，如果有可以不加dep_var

这个其实比较起来还比较复杂。因为有var依赖其实可以是间接好几层的依赖。正确的做法是把reference_count_pass.cc里的ShrinkDepsFunctor的代码挪过来来看。但考虑到改动比较大，可以考虑下一个PR改。

wangxicoding · 2021-07-29T09:18:49Z

paddle/fluid/framework/ir/multi_devices_graph_pass/fix_op_run_order_pass.cc

+      // sort next ready ops by node index
+      std::sort(tmp_ops.begin(), tmp_ops.end(), comp);
+      for (auto *op : tmp_ops) {
+        q.push(op);


fast_threaded_ssa_graph_executor调度里面，原来有些op是高优先级的，这部分需要加进来吗

有道理，已添加。

wangxicoding · 2021-07-29T09:20:31Z

paddle/fluid/framework/details/fast_threaded_ssa_graph_executor.cc

      // add one more thread for generate op_deps
      prepare_pool_(1) {
+  if (ir::IsTopologySortOperationsUnique(*graph_)) {


这个调度器感觉也可以简化一下，直接搞成顺溜执行的0.0

这个已经是有了的，里面有traced_ops_来做这个事情。

wangxicoding

LGTM

add fix op run order pass

a92a842

sneaxiy and others added 5 commits July 28, 2021 13:53

Merge branch 'develop' into add_fix_op_run_order_pass

1a2bb03

add ut for fix_op_run_order

7d9312c

fix ci error

d8d7798

improve coverage

1ba619e

improve coverge again and fix cpu test case

c5d33f2

sneaxiy requested review from zhiqiu, gongweibao and wangxicoding July 29, 2021 07:01

wangxicoding reviewed Jul 29, 2021

View reviewed changes

follow some comments

e0be94b

wangxicoding approved these changes Jul 29, 2021

View reviewed changes

sneaxiy merged commit 79e758c into PaddlePaddle:develop Jul 29, 2021

sneaxiy deleted the add_fix_op_run_order_pass branch July 29, 2021 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fix op run order pass #34427

add fix op run order pass #34427

sneaxiy commented Jul 27, 2021 •

edited

paddle-bot-old bot commented Jul 27, 2021

wangxicoding Jul 29, 2021

sneaxiy Jul 29, 2021

wangxicoding Jul 29, 2021

sneaxiy Jul 29, 2021

wangxicoding Jul 29, 2021

sneaxiy Jul 29, 2021

wangxicoding Jul 29, 2021

sneaxiy Jul 29, 2021

wangxicoding Jul 29, 2021

sneaxiy Jul 29, 2021

wangxicoding left a comment

add fix op run order pass #34427

add fix op run order pass #34427

Conversation

sneaxiy commented Jul 27, 2021 • edited

PR types

PR changes

Describe

paddle-bot-old bot commented Jul 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangxicoding left a comment

Choose a reason for hiding this comment

sneaxiy commented Jul 27, 2021 •

edited