TaskNode::order_in_chain #10102

chengtbf · 2023-04-10T12:15:31Z

拆分离编译下的：

refactor and remove order_in_graph #9909 PR 到 master 上合并。

依赖：

remove mem_chain merge #10097 先合并

移除 order_in_graph，使用 order_in_chain，在 LogicalChainPass 打开的情况下（分离编译强制 LogicalChain），logical chain 将 order_in_logical_chain 写入各个 op，从逻辑图上读取 order，跳过物理图的拓扑信息。
refine LightPlan 的输出信息

chengtbf · 2023-04-10T12:16:58Z

oneflow/core/job/plan_util.cpp

+      file_stream << "i : " << std::to_string(i) << " , actor id : " << std::to_string(task_id)
+                  << " thrd : " << std::to_string(thrd_id) << " name : " << task_id2name.at(task_id)
+                  << "\n  chain_id : " << std::to_string(task->chain_id())
+                  << " order_in_chain : " << std::to_string(task->order_in_chain())


增加了 chain id 的信息，格式修改：

before：

order : 39 , actor id : 8796126576640 name : reduce_sum-12 thrd : 4194320 device_type : kCPU stream_index : 16 { consume : in : <- [ reshape-11/__out_0 ] ( actor_id: 8796124479488, regst: {regust_num: 1, device: cpu, time_shape: (1,1,4), shape: (16,), dtype: kFloat} ) produce : tmp regst: {regust_num: 1, device: cpu, time_shape: (1,1,4), shape: (64,), dtype: kChar} { } produce : __output_tensor_0 regst: {regust_num: 1, device: cpu, time_shape: (1,1,4), shape: (), dtype: kFloat} { -> [ pack-21 ] ( actor_id: 8796147548160 ) -> [ ones_like-13 ] ( actor_id: 8796128673792 ) } } order : 40 , actor id : 8796147548160 name : pack-21 thrd : 4194330 device_type : kCPU stream_index : 26 { consume : in : <- [ reduce_sum-12/__output_tensor_0 ] ( actor_id: 8796126576640, regst: {regust_num: 1, device: cpu, time_shape: (1,1,4), shape: (), dtype: kFloat} ) produce : out regst: {regust_num: 1, device: cpu, time_shape: (1,1), shape: (4,), dtype: kFloat} { -> [ _LinearTrainGraph_0_output.0.0.1_4 ] ( actor_id: 8796149645312 ) } }

after:

i : 37 , actor id : 17592186044430 thrd : 8388608 name : add_n-10 chain_id : 0 order_in_chain : 4 device_type : kCUDA stream_index : 0 { consume : in : <- [ broadcast_add-5/__z_0 ] ( actor_id: 17592186044426, regst: {regust_num: 1, device: cuda, time_shape: (1,1,4), shape: (2,8), dtype: kFloat} ) consume : in : <- [ constant-8/__out_0 ] ( actor_id: 17592186044429, regst: {regust_num: 1, device: cuda, time_shape: (1,1,4), shape: (2,8), dtype: kFloat} ) produce : __out_0 regst: {regust_num: 1, device: cuda, time_shape: (1,1,4), shape: (2,8), dtype: kFloat} { -> [ reshape-11 ] ( actor_id: 17592186044431 ) } } i : 38 , actor id : 17592186044431 thrd : 8388608 name : reshape-11 chain_id : 0 order_in_chain : 5 device_type : kCUDA stream_index : 0 { consume : in : <- [ add_n-10/__out_0 ] ( actor_id: 17592186044430, regst: {regust_num: 1, device: cuda, time_shape: (1,1,4), shape: (2,8), dtype: kFloat} ) produce : __out_0 regst: {regust_num: 1, device: cuda, time_shape: (1,1,4), shape: (16,), dtype: kFloat} { -> [ pack-20 ] ( actor_id: 17592186044440 ) -> [ broadcast_like-14 ] ( actor_id: 17592186044434 ) -> [ reduce_sum-12 ] ( actor_id: 17592186044432 ) } } i : 39 , actor id : 17592186044432 thrd : 8388608 name : reduce_sum-12 chain_id : 0 order_in_chain : 7 device_type : kCUDA stream_index : 0 { consume : in_ctrl : <- [ pack-20/out_ctrl_103 ] ( actor_id: 17592186044440, regst: {regust_num: 1, device: cuda, ctrl} ) consume : in : <- [ reshape-11/__out_0 ] ( actor_id: 17592186044431, regst: {regust_num: 1, device: cuda, time_shape: (1,1,4), shape: (16,), dtype: kFloat} ) produce : __output_tensor_0 regst: {regust_num: 1, device: cuda, time_shape: (1,1,4), shape: (), dtype: kFloat} { -> [ pack-21 ] ( actor_id: 17592186044442 ) -> [ ones_like-13 ] ( actor_id: 17592186044433 ) } produce : tmp regst: {regust_num: 1, device: cuda, time_shape: (1,1,4), shape: (512,), dtype: kChar} { } }

chengtbf · 2023-04-10T12:18:07Z

oneflow/core/graph/straighten_nodes.cpp

@@ -606,11 +603,7 @@ void StraightenNodes(TaskGraph* task_graph, std::vector<TaskNode*>* ordered_task

  std::vector<int32_t> remain_task_nums(num_classifier, 0);

-  auto SetOrderInGraph = [&](TaskNode* task_node) {


对物理图上的拉直算法做了一点 refine，移除了 order in graph 概念。仅提供 ordered task nodes 。 @Yipeng1994

嗯嗯，之前在分离编译的大pr上看到了这个改动

github-actions · 2023-04-11T00:02:44Z

Speed stats:

GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 140.8ms (= 14077.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 152.2ms (= 15217.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.08 (= 152.2ms / 140.8ms)

OneFlow resnet50 time: 80.5ms (= 8050.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 91.1ms (= 9110.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 91.1ms / 80.5ms)

OneFlow resnet50 time: 48.5ms (= 9708.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 71.7ms (= 14331.5ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.48 (= 71.7ms / 48.5ms)

OneFlow resnet50 time: 32.5ms (= 6500.9ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 52.5ms (= 10503.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.62 (= 52.5ms / 32.5ms)

OneFlow resnet50 time: 24.9ms (= 4981.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 63.0ms (= 12609.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 2.53 (= 63.0ms / 24.9ms)

OneFlow swin dataloader time: 0.238s (= 47.567s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.810s / 200, num_workers=1)
Relative speed: 0.627 (= 0.149s / 0.238s)

OneFlow swin dataloader time: 0.069s (= 13.747s / 200, num_workers=4)
PyTorch swin dataloader time: 0.043s (= 8.670s / 200, num_workers=4)
Relative speed: 0.631 (= 0.043s / 0.069s)

OneFlow swin dataloader time: 0.042s (= 8.492s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.470s / 200, num_workers=8)
Relative speed: 0.526 (= 0.022s / 0.042s)

❌ OneFlow resnet50 time: 152.7ms (= 15271.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.6ms (= 16058.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.05 (= 160.6ms / 152.7ms)

OneFlow resnet50 time: 91.4ms (= 9138.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.5ms (= 10247.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 102.5ms / 91.4ms)

OneFlow resnet50 time: 59.7ms (= 11944.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.0ms (= 15792.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 79.0ms / 59.7ms)

OneFlow resnet50 time: 43.1ms (= 8628.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.7ms (= 14344.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 71.7ms / 43.1ms)

OneFlow resnet50 time: 36.8ms (= 7365.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.1ms (= 13610.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.85 (= 68.1ms / 36.8ms)

github-actions · 2023-04-11T00:06:49Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10102/

oneflow/core/graph/task_graph.cpp

strint · 2023-04-11T01:54:06Z

oneflow/core/job/task.proto

-  map<string, RegstDescProto> produced_regst_desc = 8;
-  map<string, RegstDescIdSet> consumed_regst_desc_id = 9;
-  optional bool all_register_num_eq_one_hint = 10 [default = false];
+  required int64 chain_id = 10;


这些 id 按说还是可以复用的，比如这里还用 6 ？

proto 部分一直在更新，一直也没有保证兼容

这些 id 按说还是可以复用的

可以用 6，但用了也不兼容，因为之前是：task_set_info

这里是希望如果后续再插入字段，比如 xx_id，可以不影响 chain id 之后的字段。参考 op_conf 中对不同类型的 type 的字段分割。

proto 部分一直在更新，一直也没有保证兼容

是的，后续还会大改，因为 plan/job 里有很多冗余字段

只能保证大版本内的可用性。如果跨版本，job/plan 重新走一次编译就好了

cache plan 可以做一些检查，如果发现存储的 plan 是老版本的，就自动重新编译覆盖掉

Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>

…dev_cc_order_in_chain

github-actions · 2023-04-11T07:02:54Z

Speed stats:

GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 140.9ms (= 14087.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 145.0ms (= 14502.0ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.03 (= 145.0ms / 140.9ms)

OneFlow resnet50 time: 80.5ms (= 8050.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 91.9ms (= 9187.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.14 (= 91.9ms / 80.5ms)

OneFlow resnet50 time: 49.2ms (= 9836.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 72.2ms (= 14440.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.47 (= 72.2ms / 49.2ms)

OneFlow resnet50 time: 32.6ms (= 6526.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 64.0ms (= 12790.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.96 (= 64.0ms / 32.6ms)

OneFlow resnet50 time: 25.1ms (= 5010.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 64.4ms (= 12878.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 2.57 (= 64.4ms / 25.1ms)

OneFlow swin dataloader time: 0.243s (= 48.674s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.222s / 200, num_workers=1)
Relative speed: 0.621 (= 0.151s / 0.243s)

OneFlow swin dataloader time: 0.068s (= 13.628s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.267s / 200, num_workers=4)
Relative speed: 0.607 (= 0.041s / 0.068s)

OneFlow swin dataloader time: 0.042s (= 8.331s / 200, num_workers=8)
PyTorch swin dataloader time: 0.021s (= 4.236s / 200, num_workers=8)
Relative speed: 0.508 (= 0.021s / 0.042s)

❌ OneFlow resnet50 time: 152.4ms (= 15236.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.9ms (= 15990.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.05 (= 159.9ms / 152.4ms)

OneFlow resnet50 time: 91.0ms (= 9096.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 108.8ms (= 10884.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 108.8ms / 91.0ms)

OneFlow resnet50 time: 59.6ms (= 11921.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 80.7ms (= 16130.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 80.7ms / 59.6ms)

OneFlow resnet50 time: 43.5ms (= 8690.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.0ms (= 15204.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.75 (= 76.0ms / 43.5ms)

OneFlow resnet50 time: 37.4ms (= 7480.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.2ms (= 14632.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.96 (= 73.2ms / 37.4ms)

Yipeng1994

简单测试过了拉直，功能正常

github-actions · 2023-04-11T07:15:08Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10102/

github-actions · 2023-04-11T07:26:48Z

CI failed when running job: cpu-misc. PR label automerge has been removed

github-actions · 2023-04-11T08:28:39Z

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions · 2023-04-11T09:10:49Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10102/

chengtbf added 2 commits April 10, 2023 06:28

remove mem_chain merge

f1c2078

TaskNode::order_in_chain

9f6d21e

chengtbf added enhancement WIP work in progress graph graph mode labels Apr 10, 2023

chengtbf requested review from strint, leaves-zwx and Yipeng1994 April 10, 2023 12:15

chengtbf commented Apr 10, 2023

View reviewed changes

Base automatically changed from dev_cc_rm_mem_chain_merge to master April 10, 2023 20:23

chengtbf marked this pull request as ready for review April 10, 2023 22:03

Merge branch 'master' into dev_cc_order_in_chain

51efb80

chengtbf added automerge and removed WIP work in progress labels Apr 10, 2023

chengtbf requested a review from oneflow-ci-bot April 10, 2023 22:05

strint reviewed Apr 11, 2023

View reviewed changes

oneflow/core/graph/task_graph.cpp Outdated Show resolved Hide resolved

strint reviewed Apr 11, 2023

View reviewed changes

strint approved these changes Apr 11, 2023

View reviewed changes

chengtbf and others added 2 commits April 11, 2023 10:49

Update oneflow/core/graph/task_graph.cpp

2c73e74

Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>

Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …

e4d30df

…dev_cc_order_in_chain

leaves-zwx approved these changes Apr 11, 2023

View reviewed changes

refine task proto id

677ece4

chengtbf requested review from oneflow-ci-bot and removed request for oneflow-ci-bot April 11, 2023 03:28

Merge branch 'master' into dev_cc_order_in_chain

ec7d071

Yipeng1994 approved these changes Apr 11, 2023

View reviewed changes

github-actions bot removed the automerge label Apr 11, 2023

chengtbf merged commit b1e86f6 into master Apr 11, 2023

chengtbf deleted the dev_cc_order_in_chain branch April 11, 2023 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TaskNode::order_in_chain #10102

TaskNode::order_in_chain #10102

chengtbf commented Apr 10, 2023

chengtbf Apr 10, 2023

chengtbf Apr 10, 2023

Yipeng1994 Apr 11, 2023

github-actions bot commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

strint Apr 11, 2023 •

edited

Loading

chengtbf Apr 11, 2023

chengtbf Apr 11, 2023

github-actions bot commented Apr 11, 2023

Yipeng1994 left a comment

github-actions bot commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

		@@ -606,11 +603,7 @@ void StraightenNodes(TaskGraph* task_graph, std::vector<TaskNode> ordered_task

		std::vector<int32_t> remain_task_nums(num_classifier, 0);

		auto SetOrderInGraph = [&](TaskNode* task_node) {

TaskNode::order_in_chain #10102

TaskNode::order_in_chain #10102

Conversation

chengtbf commented Apr 10, 2023

chengtbf Apr 10, 2023

Choose a reason for hiding this comment

chengtbf Apr 10, 2023

Choose a reason for hiding this comment

Yipeng1994 Apr 11, 2023

Choose a reason for hiding this comment

github-actions bot commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

strint Apr 11, 2023 • edited Loading

Choose a reason for hiding this comment

chengtbf Apr 11, 2023

Choose a reason for hiding this comment

chengtbf Apr 11, 2023

Choose a reason for hiding this comment

github-actions bot commented Apr 11, 2023

Yipeng1994 left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

github-actions bot commented Apr 11, 2023

strint Apr 11, 2023 •

edited

Loading