fix cumsum op for API 2.0, optimize performance test=develop #25505

LutaoChu · 2020-07-13T13:26:34Z

PR types

Function optimization, Performance optimization

PR changes

OPs

Describe

1 optimize forward performance

2 optimize backward performance

--	优化前	优化后	结论
反向耗时	313ms	0.185ms	加速1691倍

3 add parameters "dtype", "name"
4 if axis=None, flatten the input
5 full negative indexing for the 'axis' parameter is supported

paddle-bot-old · 2020-07-13T13:26:43Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wawltor · 2020-07-14T11:58:55Z

python/paddle/tensor/math.py

+    """
+	:alias_main: paddle.cumsum
+	:alias: paddle.cumsum,paddle.tensor.cumsum,paddle.tensor.math.cumsum
+	:old_api: paddle.fluid.layers.cumsum


删除这行

TCChenlong

LGTM

XiaoguangHu01 · 2020-07-16T13:10:04Z

paddle/fluid/operators/cumsum_op.cc

@@ -22,7 +22,14 @@ class CumOp : public framework::OperatorWithKernel {
  using framework::OperatorWithKernel::OperatorWithKernel;

  void InferShape(framework::InferShapeContext *ctx) const override {
-    ctx->SetOutputDim("Out", ctx->GetInputDim("X"));
+    if (ctx->Attrs().Get<bool>("flatten")) {


使用到了新增属性，这里是否会破坏兼容性？
考虑下1.8模型训练后保存inference_model，然后用2.0进行预测的场景。

考虑的场景只是1.8保存的静态图模型能不能用2.0静态图预测吗？

自测不会破坏兼容性

XiaoguangHu01 · 2020-07-16T13:11:16Z

paddle/fluid/operators/cumsum_op.cc

@@ -37,6 +44,10 @@ class CumsumOpMaker : public framework::OpProtoAndCheckerMaker {
                 "dimension [default -1].")
        .SetDefault(-1)
        .EqualGreaterThan(-1);
+    AddAttr<bool>("flatten",


新增属性，是否能保持兼容？
考虑下1.8模型训练后保存inference_model，然后用2.0进行预测的场景。

自测不会破坏兼容性

XiaoguangHu01 · 2020-07-16T13:23:29Z

paddle/fluid/operators/cumsum_op.cu

+    // size of the ‘axis’ dimension. Invalid in reverse case because the thrust
+    // APIs do not support.
+    if (size == out_dims[axis] && !reverse) {
+      if (exclusive) {


反向时是否有必要添加CUDA Kernel，以提升速度？

已添加反向时的CUDA kernel，可加速1691倍

wawltor · 2020-08-07T03:12:27Z

python/paddle/tensor/math.py

+    The cumulative sum of the elements along a given axis. The first element of the result is the same of the first element of the input. 
+
+    Args:
+        x (Variable): Input of cumsum operator, the Tensor/LoDTensor needed to be cumsumed. 


Variable->Tensor

wawltor · 2020-08-07T03:12:41Z

python/paddle/tensor/math.py

+        name(str, optional): Normally there is no need for user to set this property.  For more information, please refer to :ref:`api_guide_Name`. The default value is None.  
+
+    Returns:
+        Variable(Tensor/LoDTensor): The result of cumsum operator, output of cumsum operator. 


Variable->Tensor

Tensor, the result of cumsum operator, output of cumsum operator.

LutaoChu · 2020-08-07T04:23:54Z

能把性能对比的 code 发出来么？谢谢~

可以的。需要运行backward=True和backward=False两种case，反向的耗时需要换算一下，把backward的耗时减去前向的耗时就是反向的耗时。

import paddle
from paddle.imperative import to_variable
import numpy as np
import paddle.fluid.dygraph as dg
import paddle.fluid as fluid
import paddle.fluid.layers as F
import paddle.fluid.profiler as profiler


def test_speed(num_epochs=5, axis=0, backward=False):
    num_class = 3
    probas_data = np.random.random((2, 3, 1000, 1000))
    probas_data = probas_data.astype(np.float32)
    labels_data = np.random.randint(0, num_class, [1, 1], dtype='int32')
    
    
    probas_shape = [3, 3, 3]
    labels_shape = list(labels_data.shape)
    probas = fluid.layers.data(name='p', shape=probas_shape, dtype='float32')
    labels = fluid.layers.data(name='l', shape=labels_shape, dtype='int32')
    
    param_attr = fluid.ParamAttr(name='conv2d.weight', initializer=fluid.initializer.ConstantInitializer(value=2.0))
    y_predict = fluid.layers.conv2d(input=probas, num_filters=num_class, filter_size=2, param_attr=param_attr)
    y_predict = paddle.reshape(y_predict, shape=[-1,1])
    y_predict = paddle.cumsum(y_predict, axis=axis)
    

    labels = fluid.layers.cast(labels, dtype='float32')
    cost = fluid.layers.square_error_cost(input=y_predict, label=labels)  
    loss = fluid.layers.mean(cost)  

    if backward:
        sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.1)
        sgd_optimizer.minimize(loss)

    use_cuda = True
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() 
    main_program = fluid.default_main_program()
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())
    for i in range(5):
        exe.run(
            main_program,
            fetch_list= [],
            feed={'p': probas_data, 'l': labels_data, 'm': ignore_data}, 
            return_numpy=True)

    with profiler.profiler('GPU', 'total', '/tmp/profile') as prof:
        for i in range(num_epochs):
            exe.run(
                main_program,
                fetch_list= [],
                feed={'p': probas_data, 'l': labels_data, 'm': ignore_data}, 
                return_numpy=True)
    
# test_speed(num_epochs=300)
test_speed(num_epochs=300, backward=True)

… cumsum_fix

XiaoguangHu01 · 2020-08-10T03:53:09Z

python/paddle/tensor/math.py

@@ -1543,3 +1542,73 @@ def kron(x, y, name=None):
    out = helper.create_variable_for_type_inference(dtype=x.dtype)
    helper.append_op(type="kron", inputs={"X": x, "Y": y}, outputs={"Out": out})
    return out
+
+
+def cumsum(x, axis=None, dtype=None, name=None):


在fluid.layers.cumsum函数添加@deprecated装饰器，指明跟新API的关系。

已在新的PR中添加@deprecated装饰器 #26104

TCChenlong

LGTM

TCChenlong · 2020-08-10T05:59:16Z

python/paddle/tensor/math.py

+def cumsum(x, axis=None, dtype=None, name=None):
+    """
+	:alias_main: paddle.cumsum
+	:alias: paddle.cumsum,paddle.tensor.cumsum,paddle.tensor.math.cumsum


alias都不用写，可以删除这两行

LutaoChu force-pushed the cumsum_fix branch from eecbd23 to ea5c33a Compare July 14, 2020 06:42

wawltor reviewed Jul 14, 2020

View reviewed changes

fix cumsum op for API 2.0, optimize performance test=develop

918fd5d

LutaoChu force-pushed the cumsum_fix branch from ea5c33a to 918fd5d Compare July 14, 2020 12:47

TCChenlong previously approved these changes Jul 16, 2020

View reviewed changes

XiaoguangHu01 reviewed Jul 16, 2020

View reviewed changes

LutaoChu dismissed TCChenlong’s stale review via 8f01784 August 6, 2020 03:17

LutaoChu force-pushed the cumsum_fix branch from 8f01784 to 48f7396 Compare August 6, 2020 03:39

wawltor reviewed Aug 7, 2020

View reviewed changes

LutaoChu force-pushed the cumsum_fix branch 2 times, most recently from 9068bc7 to 340c592 Compare August 7, 2020 05:33

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

de911b7

… cumsum_fix

LutaoChu force-pushed the cumsum_fix branch from 340c592 to de911b7 Compare August 10, 2020 02:29

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

44337de

… cumsum_fix

LutaoChu requested review from XiaoguangHu01, wawltor and TCChenlong August 10, 2020 03:16

XiaoguangHu01 approved these changes Aug 10, 2020

View reviewed changes

TCChenlong approved these changes Aug 10, 2020

View reviewed changes

wawltor merged commit bf2db64 into PaddlePaddle:develop Aug 10, 2020

Xreki mentioned this pull request Aug 11, 2020

Add several configs of cumsum, arg(max/min), slice. PaddlePaddle/benchmark#531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix cumsum op for API 2.0, optimize performance test=develop #25505

fix cumsum op for API 2.0, optimize performance test=develop #25505

LutaoChu commented Jul 13, 2020 •

edited

Loading

paddle-bot-old bot commented Jul 13, 2020

wawltor Jul 14, 2020

LutaoChu Jul 14, 2020

TCChenlong left a comment

XiaoguangHu01 Jul 16, 2020

LutaoChu Jul 30, 2020

LutaoChu Aug 6, 2020

XiaoguangHu01 Jul 16, 2020

LutaoChu Aug 6, 2020

XiaoguangHu01 Jul 16, 2020

LutaoChu Aug 6, 2020

wawltor Aug 7, 2020

LutaoChu Aug 7, 2020

wawltor Aug 7, 2020

wawltor Aug 7, 2020

LutaoChu Aug 7, 2020

LutaoChu commented Aug 7, 2020

XiaoguangHu01 Aug 10, 2020

LutaoChu Aug 11, 2020

TCChenlong left a comment

TCChenlong Aug 10, 2020

fix cumsum op for API 2.0, optimize performance test=develop #25505

fix cumsum op for API 2.0, optimize performance test=develop #25505

Conversation

LutaoChu commented Jul 13, 2020 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Jul 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LutaoChu commented Aug 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LutaoChu commented Jul 13, 2020 •

edited

Loading