fix reduce min max backward bug #5651

simonJJJ · 2021-07-28T16:29:40Z

When I referred to some implementations in reduce_min backward, I found there is different behaviour between oneflow and pytorch:

So I fix it!

hjchen2 · 2021-07-29T01:43:09Z

oneflow/core/autograd/gradient_funcs/reduce_ops.cpp

  const auto& bcast_like =
      JUST(OpInterpUtil::Dispatch<Tensor>(*bcast_like_op_, {output, input}, bcast_attrs));
  const auto& bcast_eq =
      JUST(OpInterpUtil::Dispatch<Tensor>(*bcast_equal_op_, {input, bcast_like}));
  const auto& cast_like = JUST(OpInterpUtil::Dispatch<Tensor>(*cast_like_op_, {bcast_eq, input}));
+  const auto& reduce_sum_ =
+      JUST(OpInterpUtil::Dispatch<Tensor>(*reduce_sum_op_, {cast_like}, reduce_sum_attrs));


这里用functional接口重写一下吧，

const auto& reduce_sum_ = JUST(functional::ReduceSum(cast_like, ctx->axis, ctx->keepdims));

其他地方也一样改一下，CastLike目前还没有functional接口，也一起加一个。

hjchen2 · 2021-07-29T14:04:23Z

oneflow/core/autograd/gradient_funcs/reduce_ops.cpp

@@ -19,6 +19,7 @@ limitations under the License.
 #include "oneflow/core/framework/op_expr.h"
 #include "oneflow/core/framework/op_expr_helper.h"


op_expr_helper.h 这个头文件可以删了

hjchen2 · 2021-07-29T14:06:00Z

oneflow/core/autograd/gradient_funcs/reduce_ops.cpp

 };

 Maybe<void> ReduceMaxOrMinOp::Init(const OpExpr& op) {
  const auto* fw_op_expr = dynamic_cast<const UserOpExpr*>(&op);
  CHECK_NOTNULL_OR_RETURN(fw_op_expr);
  base_attrs_ = MakeAttrMapFromUserOpConf(fw_op_expr->proto());
-  const std::string& op_name = fw_op_expr->op_name();
-  bcast_like_op_ =
-      JUST(op_expr_helper::BroadcastLikeOp(/*axis=*/{-1}, GradientOpName(op_name + "_bcast_like")));


顺便把这个文件中其他地方的op_expr_helper::XXXOp也一起改成functional的吧

hjchen2 · 2021-07-29T14:06:47Z

oneflow/core/autograd/gradient_funcs/reduce_ops.cpp

+  const auto& bcast_eq = JUST(functional::BroadcastEqual(input, bcast_like));
+  const auto& cast_like = JUST(functional::CastLike(bcast_eq, input));
+  const auto& reduce_sum_ = JUST(functional::ReduceSum(cast_like, ctx->axis, ctx->keepdims));
+  const auto& bcast_div_ = JUST(functional::BroadcastDiv(dy, reduce_sum_));


bcast_div_ -> bcast_div

github-actions · 2021-07-31T14:34:06Z

Speed stats:

GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 139.2ms (= 6958.9ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 126.8ms (= 6338.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 139.2ms / 126.8ms)

PyTorch resnet50 time: 84.3ms (= 4217.3ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.1ms (= 3705.4ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.14 (= 84.3ms / 74.1ms)

PyTorch resnet50 time: 58.8ms (= 2937.9ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.2ms (= 2360.5ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.24 (= 58.8ms / 47.2ms)

PyTorch resnet50 time: 49.1ms (= 2453.5ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.6ms (= 2078.5ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.18 (= 49.1ms / 41.6ms)

PyTorch resnet50 time: 42.6ms (= 2128.2ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 46.0ms (= 2302.3ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 0.92 (= 42.6ms / 46.0ms)

fix bug

f7299a7

simonJJJ added automerge bug eager op labels Jul 28, 2021

simonJJJ requested a review from hjchen2 July 28, 2021 16:29

hjchen2 reviewed Jul 29, 2021

View reviewed changes

simonJJJ added 2 commits July 29, 2021 13:08

add cast_like functor

a061a94

convert to functional impl

9446f9c

simonJJJ requested a review from hjchen2 July 29, 2021 05:10

hjchen2 reviewed Jul 29, 2021

View reviewed changes

convert to functional

358766c

hjchen2 approved these changes Jul 31, 2021

View reviewed changes

hjchen2 and others added 2 commits July 31, 2021 21:55

Merge branch 'master' into fix_reduce_minmax_backward

90b232f

Merge branch 'master' into fix_reduce_minmax_backward

f62e495

oneflow-ci-bot self-requested a review July 31, 2021 13:55

hjchen2 approved these changes Jul 31, 2021

View reviewed changes

oneflow-ci-bot merged commit c620c55 into master Jul 31, 2021

oneflow-ci-bot deleted the fix_reduce_minmax_backward branch July 31, 2021 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix reduce min max backward bug #5651

fix reduce min max backward bug #5651

simonJJJ commented Jul 28, 2021

hjchen2 Jul 29, 2021

hjchen2 Jul 29, 2021

hjchen2 Jul 29, 2021

hjchen2 Jul 29, 2021

github-actions bot commented Jul 31, 2021

		@@ -19,6 +19,7 @@ limitations under the License.
		#include "oneflow/core/framework/op_expr.h"
		#include "oneflow/core/framework/op_expr_helper.h"

fix reduce min max backward bug #5651

fix reduce min max backward bug #5651

Conversation

simonJJJ commented Jul 28, 2021

hjchen2 Jul 29, 2021

Choose a reason for hiding this comment

hjchen2 Jul 29, 2021

Choose a reason for hiding this comment

hjchen2 Jul 29, 2021

Choose a reason for hiding this comment

hjchen2 Jul 29, 2021

Choose a reason for hiding this comment

github-actions bot commented Jul 31, 2021