rebase #5601

lixinqi · 2021-07-26T08:33:54Z

No description provided.

…deduce_consistent_op_interpreter

…-Inc/oneflow into deduce_consistent_op_interpreter

hjchen2 · 2021-07-26T12:37:57Z

oneflow/core/framework/consistent_tensor_infer_cache.cpp

+
+Maybe<void> CheckIsDeviceSupportedByOp(const ParallelDesc& parallel_desc,
+                                       const std::string& op_type_name) {
+  if (IsCpuOnly(op_type_name)) { CHECK_EQ_OR_RETURN(parallel_desc.device_tag(), "cpu"); }


以后可能会支持其他device（除了cpu和cuda），不能只判断cpu吧

未来支持其他device的时候到时再处理。因为就算这一次漏写了，也不会出事。

hjchen2 · 2021-07-26T12:43:44Z

oneflow/core/framework/consistent_tensor_infer_cache.cpp

+  {
+    // Infer OpArgMutConsistentTensorMeta.
+    const auto& GetInputTensorMeta = [](int32_t i) {
+      UNIMPLEMENTED();


InferLogicalShapeAndDType不会调用这个lambda，所以这里才直接UNIMPLEMENTED()么？

是的。这是source op

hjchen2 · 2021-07-26T12:51:05Z

oneflow/core/framework/op_interpreter/eager_mirrored_op_interpreter.cpp

@@ -37,7 +37,10 @@ namespace one {

 namespace {

-Maybe<Symbol<Device>> GetDefaultDevice() { return Device::New("cpu", 0); }
+Maybe<Symbol<Device>> GetDefaultDevice(const OpExprInterpContext& ctx) {
+  if (ctx.device.has_value()) { return ctx.device.value(); }


直接来一个全局的default_device_symbol_应该更好，而且建议python中调用flow.device("type:index")不应该每次都创建一个device，而是返回一个单例的device，因为我发现python里面flow.device("type:index")还挺耗时的。

hjchen2 · 2021-07-26T12:56:43Z

oneflow/core/autograd/autograd_meta.h

@@ -86,7 +93,9 @@ class TensorInfo final {
 private:
  std::shared_ptr<const Shape> shape_;
  DataType dtype_;
-  // TODO: Add device info
+  Maybe<Symbol<Device>> device_;                                    // for local tensor


这里也可以改成Optional

hjchen2 · 2021-07-26T13:08:56Z

oneflow/core/functional/functional_api.yaml

@@ -264,7 +264,11 @@
  bind_python: True

 - name: "constant"
-  signature: "Tensor Constant(*, Shape shape, Scalar value, DataType dtype)"
+  signature: "Tensor Constant(*, Shape shape, Scalar value, DataType dtype, Int64 device)"


device可以默认为None，按照pytorch的逻辑None device应该就是cpu，

ignature: "Tensor Constant(*, Shape shape, Scalar value, DataType dtype, Int64 device=None)

在functor那里可以用Optional<Int64>来接它。

好的。我正好难以处理None的情形。

hjchen2 · 2021-07-26T13:10:20Z

oneflow/core/functional/impl/array_functor.cpp

+    }
+    const auto& parallel_distribution = JUST(MakeParallelDistribution(sbp_tuple));
+    if (!JUST(*Global<Maybe<bool>, MultiClient>::Get())) {
+      JUST(attrs.SetAttr<std::string>("nd_sbp", parallel_distribution->DebugString()));


DebugString感觉怪怪的～

这都是因为user_op_attr那里并没支持sbp基本类型。如果只能序列化的话，我倾向于序列化成txt而不是binary，因为不在乎那点存储，可读性反而很重要。
我尝试把这里改成PbMessage2TxtString

嗯，PbMessage2TxtString可以

Signed-off-by: daquexian <daquexian566@gmail.com>

…ix_no_return

…-Inc/oneflow into deduce_consistent_op_interpreter

lixinqi · 2021-07-30T06:37:15Z

python/oneflow/nn/modules/constant.py

+        placement: flow.placement = None,
+        sbp: Union[
+            flow._oneflow_internal.sbp.sbp, List[flow._oneflow_internal.sbp.sbp]
+        ] = None,


配置placement和sbp。

lixinqi · 2021-07-30T06:37:30Z

python/oneflow/test/modules/test_constant.py

+    def test_consistent_naive(test_case):
+        placement = flow.placement("cpu", {0: [0]})
+        sbp = (flow.sbp.broadcast,)
+        x = flow.ones((16, 16), placement=placement, sbp=sbp)


github-actions · 2021-07-30T08:49:22Z

Speed stats:

GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.3ms (= 7013.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 126.7ms (= 6333.8ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 140.3ms / 126.7ms)

PyTorch resnet50 time: 81.2ms (= 4061.9ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.0ms (= 3701.4ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 81.2ms / 74.0ms)

PyTorch resnet50 time: 54.4ms (= 2718.4ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.5ms (= 2374.8ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.14 (= 54.4ms / 47.5ms)

PyTorch resnet50 time: 47.7ms (= 2386.0ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 43.4ms (= 2169.7ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 47.7ms / 43.4ms)

PyTorch resnet50 time: 41.3ms (= 2063.6ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 42.4ms (= 2119.6ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 0.97 (= 41.3ms / 42.4ms)

chengtbf · 2021-08-03T04:13:31Z

oneflow/core/functional/impl/array_functor.cpp

+  Maybe<Symbol<cfg::ParallelDistribution>> MakeParallelDistribution(
+      const std::vector<Symbol<cfg::SbpParallel>>& sbp_tuple) const {
+    static thread_local std::map<std::vector<Symbol<cfg::SbpParallel>>,
+                                 Symbol<cfg::ParallelDistribution>>


这个函数是不是应该提取成为一个公共函数？因为不光是这个一个 Constant Functor 里会用到 @lixinqi @hjchen2

rebase

d248b3d

lixinqi added enhancement automerge eager system labels Jul 26, 2021

lixinqi requested review from hjchen2 and daquexian July 26, 2021 08:33

lixinqi and others added 6 commits July 26, 2021 18:29

Merge branch 'master' into deduce_consistent_op_interpreter

68bb49f

Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …

6e8a083

…deduce_consistent_op_interpreter

check in gen py

f7aa67f

Merge branch 'deduce_consistent_op_interpreter' of github.com:Oneflow…

9a2f9d5

…-Inc/oneflow into deduce_consistent_op_interpreter

merge master and fix bugs

122acb7

Merge branch 'master' into deduce_consistent_op_interpreter

43a2ac5

hjchen2 reviewed Jul 26, 2021

View reviewed changes

hjchen2 self-requested a review July 26, 2021 13:27

hjchen2 approved these changes Jul 26, 2021

View reviewed changes

hjchen2 removed the automerge label Jul 26, 2021

lixinqi added 3 commits July 26, 2021 21:37

address pr comments

0101184

address pr comments

f2b0c79

merge master

e670f76

lixinqi requested a review from oneflow-ci-bot July 27, 2021 01:53

oneflow-ci-bot and others added 8 commits July 27, 2021 01:54

auto format by CI

5be290a

Merge branch 'master' into deduce_consistent_op_interpreter

e3db142

Merge branch 'master' into deduce_consistent_op_interpreter

cf5255e

Merge branch 'master' into deduce_consistent_op_interpreter

8bdb533

functional python_arg

11fd3ee

Merge branch 'master' into deduce_consistent_op_interpreter

fd5cd19

merge master

84a91b5

Merge branch 'master' into deduce_consistent_op_interpreter

cb4695e

oneflow-ci-bot removed their request for review July 29, 2021 12:23

Merge branch 'master' into deduce_consistent_op_interpreter

d66388a

oneflow-ci-bot self-requested a review July 29, 2021 12:23

Merge branch 'master' into fix_no_return

16efab2

oneflow-ci-bot removed their request for review July 29, 2021 15:20

Merge branch 'master' into deduce_consistent_op_interpreter

430c777

oneflow-ci-bot self-requested a review July 29, 2021 17:08

Merge branch 'master' into fix_no_return

2621510

oneflow-ci-bot removed their request for review July 29, 2021 18:09

oneflow-ci-bot and others added 11 commits July 30, 2021 03:38

Merge branch 'master' into fix_no_return

32d59ba

Merge branch 'master' into fix_no_return

659af30

Merge branch 'master' into fix_no_return

2f986f9

fix return type error in xrt

946d431

Signed-off-by: daquexian <daquexian566@gmail.com>

Merge branch 'fix_no_return' of github.com:Oneflow-Inc/oneflow into f…

8761227

…ix_no_return

merge master

64b5c68

Merge branch 'fix_no_return' into deduce_consistent_op_interpreter

3b2347d

fix tick ibn sbp signature

af2e270

Merge branch 'master' into deduce_consistent_op_interpreter

7ab98b2

Merge branch 'deduce_consistent_op_interpreter' of github.com:Oneflow…

d6b0561

…-Inc/oneflow into deduce_consistent_op_interpreter

Merge branch 'master' into deduce_consistent_op_interpreter

610faea

oneflow-ci-bot self-requested a review July 30, 2021 06:28

auto format by CI

141cdf5

lixinqi commented Jul 30, 2021

View reviewed changes

Merge branch 'master' into deduce_consistent_op_interpreter

7c3d9ed

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 30, 2021 06:53

oneflow-ci-bot merged commit bf4bdd6 into master Jul 30, 2021

oneflow-ci-bot deleted the deduce_consistent_op_interpreter branch July 30, 2021 09:04

chengtbf reviewed Aug 3, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rebase #5601

rebase #5601

lixinqi commented Jul 26, 2021

hjchen2 Jul 26, 2021

lixinqi Jul 28, 2021

hjchen2 Jul 26, 2021 •

edited

Loading

lixinqi Jul 28, 2021

hjchen2 Jul 26, 2021

lixinqi Jul 28, 2021

hjchen2 Jul 26, 2021

lixinqi Jul 28, 2021

hjchen2 Jul 26, 2021

lixinqi Jul 26, 2021

lixinqi Jul 28, 2021

hjchen2 Jul 26, 2021

lixinqi Jul 26, 2021

hjchen2 Jul 26, 2021

lixinqi Jul 28, 2021

lixinqi Jul 30, 2021

lixinqi Jul 30, 2021

github-actions bot commented Jul 30, 2021

chengtbf Aug 3, 2021 •

edited

Loading

rebase #5601

rebase #5601

Conversation

lixinqi commented Jul 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hjchen2 Jul 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 30, 2021

chengtbf Aug 3, 2021 • edited Loading

Choose a reason for hiding this comment

hjchen2 Jul 26, 2021 •

edited

Loading

chengtbf Aug 3, 2021 •

edited

Loading