Feat multi input sharing graph, save and load compiled graph #9754

strint · 2023-01-15T16:56:24Z

支持共享编译后的 graph 和 variables

支持保存和加载运行时相关状态，实现离线编译。

runtime_state_dict
load_runtime_state_dict
work with graph sharing
flow.save/load runtime_state_dict

strint · 2023-01-17T10:13:25Z

oneflow/core/job_rewriter/job_completer.cpp

+            GetInputCriticalSectionCallbackBufferName(new_job_name));
+      } else if (buffer_name.rfind(kOutputCriticalSectionCallbackBufferNamePrefix, 0) == 0) {
+        op_conf.mutable_critical_section_callback_tick_conf()->set_buffer_name(
+            GetOutputCriticalSectionCallbackBufferName(new_job_name));


和 @chengtbf 讨论后，发现卡死的问题是上面这些 tick 相关的 op，都是基于全局 buffer 通信的，buffer 通信需要生成独立的 buffer name

strint · 2023-01-17T10:28:02Z

oneflow/core/vm/lazy_job_instruction_policy.h

      buffer_mgr->Get(GetSourceTickBufferName(job_name))->Push(job_instance);
+      LOG(INFO) << "vm run lazy " << job_name << " push source tick "
+                << " run count " << run_cnt;


调试完成后需要删除

strint · 2023-01-17T10:28:17Z

oneflow/core/lazy/actor/actor.cpp

@@ -433,6 +442,9 @@ void Actor::ActUntilFail() {
    AsyncRetInplaceConsumedRegstIfNoConsumer();

    AsyncSendQueuedMsg();
+    LOG(INFO) << "Actor " << actor_id_ << " name " << op_name << " finish to act count "
+              << act_cnt_;
+    ++act_cnt_;


调试完成后需要删除

leaves-zwx · 2023-01-18T04:39:33Z

Traceback (most recent call last):
  File "sd2_text2img.py", line 52, in <module>
    text_to_image(prompt, 512)
  File "sd2_text2img.py", line 32, in text_to_image
    def text_to_image(prompt, image_size, num_images_per_prompt=1, prefix=""):
  File "/home/zhangwenxiao/repos/oneflow/python/oneflow/autograd/autograd_mode.py", line 154, in wrapper
    return func(*args, **kwargs)
  File "/home/zhangwenxiao/repos/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_oneflow.py", line 650, in __call__
    vae_post_process_graph.compile(latents)
  File "/home/zhangwenxiao/repos/diffusers/src/diffusers/oneflow_graph_compile_cache.py", line 29, in compile
    self.graph_._compile_from_shared(*args, **kwargs)
  File "/home/zhangwenxiao/repos/oneflow/python/oneflow/nn/graph/graph.py", line 838, in _compile_from_shared
    self._c_nn_graph.build_with_new_input_from_shared_graph(
oneflow._oneflow_internal.exception.RuntimeError: Error: Element number in input blob must be an integer multiple of reshape_conf, but got 4718592 and 2097152

reshape 的问题出了，sd2 里面。

chengtbf · 2023-01-18T04:47:08Z

oneflow._oneflow_internal.exception.RuntimeError: Error: Element number in input blob must be an integer multiple of reshape_conf, but got 4718592 and 2097152

reshape 的问题出了，sd2 里面。

一种可能的解法是：新的 graph 再执行一遍 build，这便 build 不会触发后续的 job pass，但是会获取到新的合法的 reshape conf 等跟 shape 相关的配置（random、input 等）

leaves-zwx · 2023-01-18T04:55:51Z

一种可能的解法是：新的 graph 再执行一遍 build，这便 build 不会触发后续的 job pass，但是会获取到新的合法的 reshape conf 等跟 shape 相关的配置（random、input 等）

应该可行，重新执行一遍 build 的目的就是为了刷新 attr 等，但如何把新 build 出来的 reshape 等 op 与老的 graph 里面的对应 reshape 关联起来呢（需要填充新的 attr）

chengtbf · 2023-01-18T05:41:15Z

一种可能的解法是：新的 graph 再执行一遍 build，这便 build 不会触发后续的 job pass，但是会获取到新的合法的 reshape conf 等跟 shape 相关的配置（random、input 等）

应该可行，重新执行一遍 build 的目的就是为了刷新 attr 等，但如何把新 build 出来的 reshape 等 op 与老的 graph 里面的对应 reshape 关联起来呢（需要填充新的 attr）

按照创建顺序关联，就像 input 的 conf attr shape 关联一样。只不过稍微复杂一点。有多种办法，比如：

原始 graph build 记录 op create order，得到 order -> op_conf -> shape 的映射关系
后续 graph build 根据 order 获知映射关系，得到 new op name -> order -> op_conf -> shape -> new_op_conf 的映射关系

leaves-zwx · 2023-01-18T05:52:38Z

按照创建顺序关联，就像 input 的 conf attr shape 关联一样。只不过稍微复杂一点。有多种办法，比如：

原始 graph build 记录 op create order，得到 order -> op_conf -> shape 的映射关系

后续 graph build 根据 order 获知映射关系，得到 new op name -> order -> op_conf -> shape -> new_op_conf 的映射关系

这个 order 每次 build 时以及执行 pass 后是稳定的吗？有些 pass 对图的改写会影响 order 吧？还是说完全忽略 pass，只考虑原始的 graph

chengtbf · 2023-01-18T05:55:47Z

这个 order 每次 build 时以及执行 pass 后是稳定的吗？有些 pass 对图的改写会影响 order 吧？还是说完全忽略 pass，只考虑原始的 graph

只考虑原始 graph，不考虑 job pass 改写（reshape 不会被 fusion）。在原始 build 逻辑里，是按照 python 脚本的执行顺序触发的，每次都是一样的。

strint · 2023-01-18T09:31:58Z

python/oneflow/test/graph/test_graph_multi_graph.py

+        def forward(self, x):
+            y = self.linear(x)
+            assert len(y.shape) == 2
+            return flow.reshape(y, (y.shape[1], y.shape[0]))


reshape 的测试通过

strint · 2023-01-18T09:32:53Z

oneflow/core/job_rewriter/job_completer.cpp

+          auto attr_iter = new_op_conf->user_conf().attr().find(pair.first);
+          CHECK_OR_RETURN(attr_iter != new_op_conf->user_conf().attr().end())
+              << " There is not attr " << pair.first << " in new op " << new_op_conf->DebugString();
+          *pair.second.mutable_at_shape() = attr_iter->second.at_shape();


更新 shape attr

strint · 2023-01-31T04:48:20Z

oneflow/core/job_rewriter/job_completer.cpp

+        NewOp4SharedOpName) {
+  // job is a copy from a shared graph.
+  // The job name has already update in py nn.Graph.
+  const auto& new_job_name = job->job_conf().job_name();


更新了一下结构，会清楚一点

strint · 2023-01-31T07:33:18Z

python/oneflow/nn/graph/graph.py

+            destination = OrderedDict()
+            destination._metadata = OrderedDict()
+
+        destination["graph_name"] = self.name


runtime_state_dict 保存了 graph 运行时执行需要的信息：

job name

job id

输入 tensor 和 name

输出 tensor 和 name

variable tensor 和 name

plan

@leaves-zwx

输入 tensor 和 name

输入 tensor 和 name

输出 tensor 和 name ? 为什么需要保存输入输出 tensor ? 这里其实是保存 tensor meta ?

是的，实际上只需要 meta。但是保存 tensor 的机制是成熟的，所以就直接保存了 tensor。

实际 c nn graph 里面也是只使用 meta，但也是存为了 tensor。

strint · 2023-01-31T08:23:46Z

python/oneflow/nn/graph/graph.py

+        # Create a c nn graph to run with lazy runtime.
+        self._c_nn_graph = oneflow._oneflow_internal.nn.graph.CNNGraph(
+            self._name,
+            state_dict["exe_plan"],


load graph 时，直接传递 plan

github-actions · 2023-01-31T11:06:15Z

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 143.0ms (= 14301.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 166.5ms (= 16654.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 166.5ms / 143.0ms)

OneFlow resnet50 time: 87.9ms (= 8789.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.6ms (= 10458.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 104.6ms / 87.9ms)

OneFlow resnet50 time: 60.3ms (= 12069.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.5ms (= 15896.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 79.5ms / 60.3ms)

OneFlow resnet50 time: 46.2ms (= 9248.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.8ms (= 15965.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.73 (= 79.8ms / 46.2ms)

OneFlow resnet50 time: 41.8ms (= 8353.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.1ms (= 13612.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.63 (= 68.1ms / 41.8ms)

github-actions · 2023-01-31T11:10:37Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9754/

strint · 2023-02-01T03:34:32Z

oneflow/core/common/buffer_manager.h

+    "OutputCriticalSectionCallback-";
+static const std::string kInputBufferNamePrefix = "Input-";
+static const std::string kOutputBufferNamePrefix = "Output-";
+static const std::string kSourceTickBufferNamePrefix = "SourceTick-";


因为本文件外部要使用这些名字，所以放到了函数外部

strint · 2023-02-01T03:35:56Z

oneflow/core/framework/nn_graph.h

+      : name_(name),
+        job_id_(job_id),
+        session_ctx_(session_ctx),
+        plan_(plan),


支持重 plan 初始化 NNGraph

strint · 2023-02-01T03:37:54Z

oneflow/core/job_rewriter/job_completer.h

@@ -28,7 +29,15 @@ class JobCompleter final {
  JobCompleter() = default;
  ~JobCompleter() = default;

-  Maybe<void> Complete(Job* job) const;
+  static Maybe<void> Complete(Job* job);


Complete 函数改成了 static

strint · 2023-02-01T03:39:05Z

python/oneflow/__init__.py

@@ -262,8 +262,8 @@ def is_deprecated(func_or_class):
 import oneflow.framework.session_context as session_ctx
 from oneflow.framework.tensor_str import set_printoptions

-__oneflow_global_unique_env = env_util.GetEnv()
-session_ctx.NewDefaultSession(__oneflow_global_unique_env)
+_oneflow_global_unique_env = env_util.GetEnv()


创建 new session 时需要依赖 env，所以去掉了 __ 以在其它模块可以获取 env

leaves-zwx · 2023-02-01T03:40:27Z

python/oneflow/nn/graph/graph.py

@@ -910,7 +1207,13 @@ def __build_graph(self, *args, **kwargs):
        with graph_build_util.graph_build_context(self.config.proto, self._session):
            # Deal with inputs
            self.__print(0, 1, self._shallow_repr() + " start building graph inputs.")
-            arg_op_names, lazy_args, lazy_kwargs, self._args_repr, _ = self.__build_io(
+            (
+                self._input_op_names,


这个字段不是 enable_save_runtime_state_dict 为 true 时才会有吗？

嗯，我再改回去，这里之前没用 share 开关，直接保存了

python/oneflow/test/expensive/test_graph_multi_graph.py

leaves-zwx · 2023-02-01T03:45:17Z

python/oneflow/nn/graph/graph.py

-                output_op_names,
-                self._eager_outputs,
+                self._output_op_names,
+                self._build_eager_outputs,


这个字段没看到设置的地方

还有下面的 out2name

上面有设置，不过不太清楚，我改一下

leaves-zwx · 2023-02-01T06:04:32Z

python/oneflow/nn/graph/graph.py

+                _,  # empty kwargs return
+                outs_repr,
+                out2name,
+            ) = self.__build_io("output", graph_build_util.build_graph_output, *outputs)


前面把 output 转成 tuple，这里再 unpack 是为了什么？

这个是 __build_io 的设定的逻辑。build 成 tuple 是假设了输入形式为 (args, kwargs)，这样比较通用。

直接传入 outputs 也符合这个设定吧，outputs 也会被解析为 args 的一员。

想起来了，是为了处理边界情况：#7539

chengtbf · 2023-01-31T14:48:21Z

oneflow/api/python/framework/nn_graph.cpp

        }
        return std::make_shared<NNGraph>(name, job, job_id, session_ctx);
      }))
+      .def(py::init([](const std::string& name, const std::string& serialized_plan, int64_t job_id,
+                       const std::shared_ptr<MultiClientSessionContext>& session_ctx,


这个 ctx 是为了还原 save 的时候的 session？

这个 ctx 是为了还原 save 的时候的 session？

这个 ctx 是之前那个用引用计数来释放 graph/session/env 的 pr 引入的，这个 pr 只是新增了一个从 plan 构造 c nn graph 构造函数。

chengtbf · 2023-01-31T14:50:53Z

oneflow/core/framework/nn_graph.cpp


-  // NOTE(chengcheng): Singleton<JobDesc> need be clear before GlobalJobDescScope construct.
-  if (Singleton<JobDesc>::Get() != nullptr) { Singleton<JobDesc>::Delete(); }


这行逻辑挪到哪里了呢

chengtbf · 2023-01-31T15:05:39Z

oneflow/core/framework/nn_graph.cpp

+Maybe<void> NNGraph::BuildWithNewInputFromSharedGraph(
+    const std::vector<std::string>& shared_inputs_op_names,
+    const std::vector<std::shared_ptr<one::Tensor>>& new_input_tensors,
+    const std::vector<std::string>& shared_op_names, const std::string& new_serialized_job) {


shared_op_names 这个参数有什么用呢， new_serialized_job 里是不是都有

shared_op_names 这个参数有什么用呢， new_serialized_job 里是不是都有

shared_op_names = [] for op_idx in range(len(self._forward_job_proto.net.op)): shared_op_names.append( self._shared_graph._forward_job_proto.net.op[op_idx].name )

shared_op_names 是从 build 那里直接产生的原始逻辑图得到的，new_serialized_job 里面已经是优化后的图了。
优化后的图，没有顺序保证了。

new_serialized_job 里面已经是优化后的图了

那如何保证 shared_op_names 和 new_serialized_job 两者相同呢，可能 new_serialized_job 没有 shared_op_names 里的 op 了

chengtbf · 2023-01-31T15:23:12Z

oneflow/core/framework/nn_graph.cpp

+  for (int64_t idx = 0; idx < shared_inputs_op_names.size(); ++idx) {
+    input_name2tensor.emplace(shared_inputs_op_names[idx], new_input_tensors[idx]);
+  }
+  const auto& InputTensor4Name =


这个是不是应该放在 RegisterInputOpNamesAndTensors 里

这个是不是应该放在 RegisterInputOpNamesAndTensors 里

InputTensor4Name 只有下面用了，上面的RegisterInputOpNamesAndTensors不依赖这个查找，所以就写成了用完就释放的形式

chengtbf · 2023-02-01T05:43:14Z

oneflow/core/framework/nn_graph.cpp

+  for (int64_t op_idx = 0; op_idx < shared_op_names.size(); ++op_idx) {
+    // Assume that the new graph and the shared graph from nn.Graph.build have the same op order.
+    const auto& op = new_build_job.mutable_net()->mutable_op()->at(op_idx);
+    shared_op_name2_new_op.emplace(shared_op_names[op_idx], &op);


其实可以把： new_build_job 直接传给： CompleteSharedGraphForNewInput 吧

其实可以把： new_build_job 直接传给： CompleteSharedGraphForNewInput 吧

这个 map 其实是从：shared op name 到new build job 中的 op。

中间用 op 顺序做了下对应， shared op name 都 op order 到 new build job 中的 op。

以给后面修改 shared graph op attr 做准备。所以只传递 new_build_job 还不行。

这个我改下名字，然后注释下。

所以只传递 new_build_job 还不行。

这里我理解是 new_build_job 是不包含 op 顺序导致的？

new_build_job 是新的 build 函数产生的临时 job，后来改了下名字，它作为新 graph attr 的词典存在。

所以要额外传递 op name 信息来维护新老 op 的对应关系

chengtbf · 2023-02-01T06:17:21Z

python/oneflow/nn/graph/graph.py

+                state_dict["states"]
+            )
+            if type(self) != Graph:
+                # Graph init with eager module, try to share mem with eager module


这里的 state dict 不是 load plan 时加载的吗，怎么还需要处理 eager module 的事情

if not load_with_eager: # 不带 eager 的 graph 初始化 linear_g = flow.nn.Graph() else: # 带有 eager 的 graph 初始化 class LinearGraph(flow.nn.Graph): def __init__(self): super().__init__() self.my_linear = linear_reshape def build(self, x): return self.my_linear(x) linear_g = LinearGraph() # 加载运行时状态 linear_g.load_runtime_state_dict(state_dict_list[0])

测试 sd 的时候，它采用了带有 eager 的 graph 初始化。

发现如果不考虑 eager 的参数共享，会多 1.8 G 显存开销，所以加上了对这个情况的处理。

chengtbf · 2023-02-01T06:19:07Z

python/oneflow/nn/graph/graph.py

+    ):
+        if self._enable_save_runtime_state_dict or self._enable_shared_from_this:
+            self._input_op_names = input_op_names
+            self._output_op_names = output_op_names


这些内容如果默认就保存，有什么代价吗（不区分 _enable_save_runtime_state_dict ）

之前区分主要是考虑保存 tensor 的额外开销，比如保存 _inputs_tensor_tuple ，多占用显存。

然后觉得既然考虑了，就都做了区分。

python/oneflow/test/expensive/test_graph_multi_graph.py

leaves-zwx · 2023-02-01T06:29:29Z

python/oneflow/nn/graph/graph.py

            ) = oneflow._oneflow_internal.DumpVariableTensorMgr()
-            self._state_tensor_tuple = convert_to_tensor_tuple(state_tensors)
+            self._state_tensor_tuple = convert_to_tensor_tuple(self._state_tensors)


_state_tensors 好像可以作为临时变量，不用保存在 self 上？

我看到只有 runtime_state_dict 用到了 _state_op_names 和 _state_tensor_tuple

的确，已经去掉

github-actions · 2023-02-01T07:05:25Z

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.9ms (= 14194.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 165.0ms (= 16503.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 165.0ms / 141.9ms)

OneFlow resnet50 time: 87.6ms (= 8762.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 105.5ms (= 10552.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 105.5ms / 87.6ms)

OneFlow resnet50 time: 60.2ms (= 12033.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.5ms (= 15906.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 79.5ms / 60.2ms)

OneFlow resnet50 time: 45.7ms (= 9141.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.8ms (= 14158.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 70.8ms / 45.7ms)

OneFlow resnet50 time: 42.8ms (= 8551.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.5ms (= 13503.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 67.5ms / 42.8ms)

github-actions · 2023-02-01T07:15:58Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9754/

github-actions · 2023-02-01T09:01:44Z

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.2ms (= 14118.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.6ms (= 16458.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 164.6ms / 141.2ms)

OneFlow resnet50 time: 88.0ms (= 8798.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.4ms (= 10341.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 103.4ms / 88.0ms)

OneFlow resnet50 time: 60.3ms (= 12064.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 89.4ms (= 17885.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.48 (= 89.4ms / 60.3ms)

OneFlow resnet50 time: 44.8ms (= 8968.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.1ms (= 13828.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 69.1ms / 44.8ms)

OneFlow resnet50 time: 39.7ms (= 7933.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.3ms (= 13466.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.70 (= 67.3ms / 39.7ms)

github-actions · 2023-02-01T09:10:16Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9754/

chengtbf

看完回复了

github-actions · 2023-02-01T10:43:29Z

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.2ms (= 14121.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 166.1ms (= 16612.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.18 (= 166.1ms / 141.2ms)

OneFlow resnet50 time: 87.4ms (= 8741.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 114.5ms (= 11447.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 114.5ms / 87.4ms)

OneFlow resnet50 time: 59.4ms (= 11876.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.1ms (= 15621.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 78.1ms / 59.4ms)

OneFlow resnet50 time: 46.1ms (= 9214.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.1ms (= 14228.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.54 (= 71.1ms / 46.1ms)

OneFlow resnet50 time: 41.3ms (= 8268.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.5ms (= 13508.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.63 (= 67.5ms / 41.3ms)

github-actions · 2023-02-01T10:47:16Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9754/

strint · 2023-02-02T03:35:41Z

python/oneflow/test/expensive/test_graph_multi_graph.py

+            return self.my_linear(x)
+
+    linear_g = LinearGraph()
+    linear_g.enable_shared()


第一个 graph，允许它被共享；

strint · 2023-02-02T03:36:10Z

python/oneflow/test/expensive/test_graph_multi_graph.py

+    test_case.assertTrue(np.array_equal(of_lazy_out.numpy(), of_eager_out.numpy()))
+
+    linear_g1 = LinearGraph()
+    linear_g1.share_from(linear_g)


第二个 graph，共享第一个 graph 的优化后的图和参数；

strint · 2023-02-02T03:36:52Z

python/oneflow/test/expensive/test_graph_multi_graph.py

+            return self.my_linear(x)
+
+    linear_g = LinearGraph()
+    linear_g.enable_save_runtime_state_dict()


做离线编译时，允许 graph 保存运行时状态；

strint · 2023-02-02T03:37:50Z

python/oneflow/test/expensive/test_graph_multi_graph.py

+    return_dict["save1"] = test_case1
+
+    state_dict_list = []
+    state_dict0 = linear_g.runtime_state_dict()


做离线编译时，获取 graph 的运行时状态；

这个 state_dict 可以用 flow.save 保存；

strint · 2023-02-02T03:38:40Z

python/oneflow/test/expensive/test_graph_multi_graph.py

+        linear_g = LinearGraph()
+    if with_share is True:
+        linear_g.enable_shared()
+    linear_g.load_runtime_state_dict(state_dict_list[0])


做在线加载时，state_dict 用 flow.load() 从磁盘获取；然后 graph 加载运行时状态即可；

strint added 3 commits January 15, 2023 18:50

refact complete and gen plan

72cab02

add align states

9b968f1

mut input and re-infer

ee42376

strint marked this pull request as ready for review January 15, 2023 16:57

strint requested review from chengtbf, BBuf, daquexian and jackalcooper as code owners January 15, 2023 16:57

strint changed the title ~~Feat multi input of a graph~~ [WIP]Feat multi input of a graph Jan 15, 2023

strint requested review from leaves-zwx and removed request for daquexian and BBuf January 15, 2023 16:57

strint added 4 commits January 16, 2023 18:14

add build from shared draft

e68d89a

infer shape and build new output

6c50cef

support the main logical, will stuck

ec7a542

fix tick

78a41da

strint commented Jan 17, 2023

View reviewed changes

infer blob with input tensor

10f0c9e

support shape attr update

843b429

strint commented Jan 18, 2023

View reviewed changes

rm useless header

9dc411d

strint commented Jan 31, 2023

View reviewed changes

strint commented Feb 1, 2023

View reviewed changes

leaves-zwx reviewed Feb 1, 2023

View reviewed changes

strint commented Feb 1, 2023

View reviewed changes

python/oneflow/test/expensive/test_graph_multi_graph.py Show resolved Hide resolved

leaves-zwx reviewed Feb 1, 2023

View reviewed changes

address review, refine states for share and save runtime

6be6b90

leaves-zwx reviewed Feb 1, 2023

View reviewed changes

chengtbf reviewed Feb 1, 2023

View reviewed changes

leaves-zwx reviewed Feb 1, 2023

View reviewed changes

leaves-zwx approved these changes Feb 1, 2023

View reviewed changes

address review

edb1061

chengtbf approved these changes Feb 1, 2023

View reviewed changes

Merge branch 'master' into feat_multi_in

21fc9ba

strint added the automerge label Feb 1, 2023

mergify bot merged commit aef9981 into master Feb 1, 2023

mergify bot deleted the feat_multi_in branch February 1, 2023 11:29

strint commented Feb 2, 2023

View reviewed changes


		// NOTE(chengcheng): Singleton<JobDesc> need be clear before GlobalJobDescScope construct.
		if (Singleton<JobDesc>::Get() != nullptr) { Singleton<JobDesc>::Delete(); }

Feat multi input sharing graph, save and load compiled graph #9754

Feat multi input sharing graph, save and load compiled graph #9754

Conversation

strint commented Jan 15, 2023 • edited by crazy-JiangDongHua Loading

支持 共享编译后的 graph 和 variables

支持保存和加载运行时相关状态，实现离线编译。

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leaves-zwx commented Jan 18, 2023

chengtbf commented Jan 18, 2023

leaves-zwx commented Jan 18, 2023

chengtbf commented Jan 18, 2023

leaves-zwx commented Jan 18, 2023

chengtbf commented Jan 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

strint Jan 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 31, 2023

github-actions bot commented Jan 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leaves-zwx Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

strint Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 1, 2023

chengtbf left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 1, 2023

github-actions bot commented Feb 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

strint commented Jan 15, 2023 •

edited by crazy-JiangDongHua

Loading

支持共享编译后的 graph 和 variables

strint Jan 31, 2023 •

edited

Loading

leaves-zwx Feb 1, 2023 •

edited

Loading

strint Feb 1, 2023 •

edited

Loading